hypothesis testing ii: z tests, t tests, and confidence ... · same story form a hypothesis get...

38
Hypothesis Testing II: z tests, t tests, and confidence intervals INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber APRIL 24, 2017 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests, t tests, and confidence intervals | 1 of 4

Upload: others

Post on 03-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Hypothesis Testing II: z tests, ttests, and confidence intervals

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests, t tests, and confidence intervals | 1 of 4

Page 2: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Same Story

• Form a hypothesis

• Get some data

• Compute test statistic

• Reject hypothesis if p-value low enough

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests, t tests, and confidence intervals | 2 of 4

Page 3: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Important difference

• Assumes data come from normal distribution

• Can cause problems if diverges too much normal

• (How can you tell?) χ2 goodness of fit

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests, t tests, and confidence intervals | 3 of 4

Page 4: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Goodness of Fit

(But we’ll assume that data are normal)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests, t tests, and confidence intervals | 4 of 4

Page 5: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Hypothesis Testing II: z tests

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 1 of 5

Page 6: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

z-test

• Suppose we have one observation from normal distribution with mean µand variance σ2

• Given an observation x we can compute the Z score as

Z =x −µσ

(1)

• H0: Our observation came from the normal distribution with µ=µ0

◦ Assume same known variance σ

◦ But we need to be more specific!

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 2 of 5

Page 7: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

z-test

• Suppose we have one observation from normal distribution with mean µand variance σ2

• Given an observation x we can compute the Z score as

Z =x −µσ

(1)

• H0: Our observation came from the normal distribution with µ=µ0

◦ Assume same known variance σ◦ But we need to be more specific!

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 2 of 5

Page 8: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Two-tailed vs. one-tailed tests

• Two tail: Alternative µ 6=µ0

• One tail: Alternative µ>µ0

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 3 of 5

Page 9: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Multiple observations

If you observe x1 . . .xN from distribution with mean µ, test whether µ 6=µ0

• Compute test statistic

Z =x̄ −µ0

σ/p

N(2)

• If H0 were true, x̄ would be normal distribution with µ0 and variance σ2

N

• Now we can decide when to reject based on normal CDF

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 4 of 5

Page 10: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

When to reject (two-tailed)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 5 of 5

Page 11: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Hypothesis Testing II: ConfidenceIntervals

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 1 of 4

Page 12: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Conveying Uncertainty

• Suppose you make a measurement (e.g., a poll)

• Often useful to show all µ0 that could be excluded given themeasurement

• Fixed α

• Confidence interval

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 2 of 4

Page 13: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

z-distribution Confidence

• Assume known variance σ

• You observe {x1 . . .xN}• Obtain mean x̄

• Recall test statisticx̄ −µσpN

(1)

• For what µ would we not reject the null?

• Note that x̄ and µ are symmetric

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 3 of 4

Page 14: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

z-distribution Confidence

• Assume known variance σ

• You observe {x1 . . .xN}• Obtain mean x̄

• Recall test statisticx̄ −µσpN

(1)

• For what µ would we not reject the null?

• Note that x̄ and µ are symmetric

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 3 of 4

Page 15: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

From the Distribution

• Set α= 0.05

• Solve for µ= NorCDF(0,0.025)

• x̄ ±1.96 σpn

• Some people just use 2.0

• More samples→ tighter bounds

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 4 of 4

Page 16: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

From the Distribution

• Set α= 0.05

• Solve for µ= NorCDF(0,0.025)

• x̄ ±1.96 σpn

• Some people just use 2.0

• More samples→ tighter bounds

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 4 of 4

Page 17: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Hypothesis Testing II: One Samplet Tests

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 1 of 6

Page 18: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

What if you don’t know variance?

• t-test allows you to test hypothesisif you don’t know variance

• Sometimes called “small sampletest”: same as z test with enoughobservations

• William Gossett: check that yeastcontent matched Guiness’sstandard (but couldn’t publish)

• I.e., checking whether yeastcontent equal to µ0

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 2 of 6

Page 19: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

t-test statistic

• Need to estimate variance

s2 =∑

i

(xi − x̄)2

N −1(1)

• n−1 removes bias (expected value is less than truth)

• Test statistic looks similar

T ≡x̄ −µ0

spN

(2)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 3 of 6

Page 20: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Degrees of Freedom

• Like χ2, t-distribution parameterized by degrees of freedom

• ν= N −1 degress of freedom

PDF

Γ�

ν+12

pνπΓ

ν2

1 +x2

ν

�− ν+12

(3)

CDF

1

2+ xΓ

ν+ 1

2

2F1

12 , ν+1

2 ; 32 ;− x2

ν

pπνΓ

ν2

� (4)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 4 of 6

Page 21: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Shape of t-distribution

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 5 of 6

Page 22: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Example

• Suppose observe {0,1,2,3,4,5}• Test whether µ 6= 1

• x̄ = 2.5, s2 = 3.5• T =

x̄−µ0Ç

s2N

= 2.5−1.0q

3.56

= 1.9640

• Double area under the at two tailed CDF

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 6 of 6

Page 23: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Example

• Suppose observe {0,1,2,3,4,5}• Test whether µ 6= 1• x̄ = 2.5, s2 = 3.5

• T =x̄−µ0Ç

s2N

= 2.5−1.0q

3.56

= 1.9640

• Double area under the at two tailed CDF

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 6 of 6

Page 24: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Example

• Suppose observe {0,1,2,3,4,5}• Test whether µ 6= 1• x̄ = 2.5, s2 = 3.5• T =

x̄−µ0Ç

s2N

= 2.5−1.0q

3.56

= 1.9640

• Double area under the at two tailed CDF

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 6 of 6

Page 25: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Example

• Suppose observe {0,1,2,3,4,5}• Test whether µ 6= 1• x̄ = 2.5, s2 = 3.5• T =

x̄−µ0Ç

s2N

= 2.5−1.0q

3.56

= 1.9640

• Double area under the at two tailed CDF

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 6 of 6

Page 26: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Hypothesis Testing II: Two Samplet Tests

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 1 of 5

Page 27: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Comparing Two Samples

• Thus far, we’ve tested whether data is consistent with mean µ

• What if we want to test whether two samples are from the samedistribution

• Two-Sample t-test

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 2 of 5

Page 28: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Comparing Two Samples

• Thus far, we’ve tested whether data is consistent with mean µ

• What if we want to test whether two samples are from the samedistribution

• Two-Sample t-test

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 2 of 5

Page 29: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Two-Sample (unpooled)

• Two samples X1 = {x1,1,x1,2 . . .x1,N1} and X2 = {x2,1,x2,2 . . .x2,N2

}• Doesn’t assume that variance is the same for both samples (unpooled)

• Compute mean and sample variance for sample 1 (x̄1,s21) and sample 2

(x̄2,s22)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 3 of 5

Page 30: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Test Statistic

• T-statistic

T =(x1− x2)r

s21

N1+

s22

N2

(1)

• Plug into t-distrubtion with

ν=

s21

N1+

s22

N2

�2

s21

N1

�2

N1−1 +

s22

N2

�2

N2−1

(2)

• Intuition: Difference between x̄1 and x̄2 has variance that’s aninterpolation between the two samples

• Two-tailed vs. one-tailed distinction still applies

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 4 of 5

Page 31: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Test Statistic

• T-statistic

T =(x1− x2)r

s21

N1+

s22

N2

(1)

• Plug into t-distrubtion with

ν=

s21

N1+

s22

N2

�2

s21

N1

�2

N1−1 +

s22

N2

�2

N2−1

(2)

• Intuition: Difference between x̄1 and x̄2 has variance that’s aninterpolation between the two samples

• Two-tailed vs. one-tailed distinction still applies

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 4 of 5

Page 32: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

ν Example

s21 = 1,s2

2 = 2,n1 = 4,n2 = 8

ν=

s21

N1+

s22

N2

�2

s21

N1

�2

N1−1 +

s22

N2

�2

N2−1

(3)

(4)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 5 of 5

Page 33: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

ν Example

s21 = 1,s2

2 = 2,n1 = 4,n2 = 8

ν=

s21

N1+

s22

N2

�2

s21

N1

�2

N1−1 +

s22

N2

�2

N2−1

(3)

=

14 + 2

8

�2

13

14

�2+ 1

7

28

�2 (4)

(5)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 5 of 5

Page 34: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

ν Example

s21 = 1,s2

2 = 2,n1 = 4,n2 = 8

ν=

s21

N1+

s22

N2

�2

s21

N1

�2

N1−1 +

s22

N2

�2

N2−1

(3)

=

14 + 2

8

�2

13

14

�2+ 1

7

28

�2 (4)

=14

14

�2 � 13 + 1

7

�=

41021

=42

5(5)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 5 of 5

Page 35: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Hypothesis Testing II

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II | 1 of 4

Page 36: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Details

• Important to pre-register hypothesis

• Check assumptions of tests: distribution, randomness, response

• Many other tests that are possible

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II | 2 of 4

Page 37: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Problems with Statistical Tests’ Philosophy

• Failing to reject the null does not prove the null

• Cannot exclude data you don’t like

• Confusing statistical significance with practical significance (µ 6= 0 couldmean µ= 0.000001)

Often better:

• Present plot with confidence intervals, let people decide for themselves

• Create comprehensive model with explainable parameters, computeintervals for parameters

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II | 3 of 4

Page 38: Hypothesis Testing II: z tests, t tests, and confidence ... · Same Story Form a hypothesis Get some data Compute test statistic Reject hypothesis if p-value low enough INFO-2301:

Problems with Statistical Tests’ Philosophy

• Failing to reject the null does not prove the null

• Cannot exclude data you don’t like

• Confusing statistical significance with practical significance (µ 6= 0 couldmean µ= 0.000001)

Often better:

• Present plot with confidence intervals, let people decide for themselves

• Create comprehensive model with explainable parameters, computeintervals for parameters

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II | 3 of 4