hypothesis testing ii: z tests, t tests, and confidence ... · same story form a hypothesis get...
TRANSCRIPT
Hypothesis Testing II: z tests, ttests, and confidence intervals
INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests, t tests, and confidence intervals | 1 of 4
Same Story
• Form a hypothesis
• Get some data
• Compute test statistic
• Reject hypothesis if p-value low enough
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests, t tests, and confidence intervals | 2 of 4
Important difference
• Assumes data come from normal distribution
• Can cause problems if diverges too much normal
• (How can you tell?) χ2 goodness of fit
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests, t tests, and confidence intervals | 3 of 4
Goodness of Fit
(But we’ll assume that data are normal)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests, t tests, and confidence intervals | 4 of 4
Hypothesis Testing II: z tests
INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 1 of 5
z-test
• Suppose we have one observation from normal distribution with mean µand variance σ2
• Given an observation x we can compute the Z score as
Z =x −µσ
(1)
• H0: Our observation came from the normal distribution with µ=µ0
◦ Assume same known variance σ
◦ But we need to be more specific!
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 2 of 5
z-test
• Suppose we have one observation from normal distribution with mean µand variance σ2
• Given an observation x we can compute the Z score as
Z =x −µσ
(1)
• H0: Our observation came from the normal distribution with µ=µ0
◦ Assume same known variance σ◦ But we need to be more specific!
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 2 of 5
Two-tailed vs. one-tailed tests
• Two tail: Alternative µ 6=µ0
• One tail: Alternative µ>µ0
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 3 of 5
Multiple observations
If you observe x1 . . .xN from distribution with mean µ, test whether µ 6=µ0
• Compute test statistic
Z =x̄ −µ0
σ/p
N(2)
• If H0 were true, x̄ would be normal distribution with µ0 and variance σ2
N
• Now we can decide when to reject based on normal CDF
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 4 of 5
When to reject (two-tailed)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: z tests | 5 of 5
Hypothesis Testing II: ConfidenceIntervals
INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 1 of 4
Conveying Uncertainty
• Suppose you make a measurement (e.g., a poll)
• Often useful to show all µ0 that could be excluded given themeasurement
• Fixed α
• Confidence interval
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 2 of 4
z-distribution Confidence
• Assume known variance σ
• You observe {x1 . . .xN}• Obtain mean x̄
• Recall test statisticx̄ −µσpN
(1)
• For what µ would we not reject the null?
• Note that x̄ and µ are symmetric
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 3 of 4
z-distribution Confidence
• Assume known variance σ
• You observe {x1 . . .xN}• Obtain mean x̄
• Recall test statisticx̄ −µσpN
(1)
• For what µ would we not reject the null?
• Note that x̄ and µ are symmetric
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 3 of 4
From the Distribution
• Set α= 0.05
• Solve for µ= NorCDF(0,0.025)
• x̄ ±1.96 σpn
• Some people just use 2.0
• More samples→ tighter bounds
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 4 of 4
From the Distribution
• Set α= 0.05
• Solve for µ= NorCDF(0,0.025)
• x̄ ±1.96 σpn
• Some people just use 2.0
• More samples→ tighter bounds
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Confidence Intervals | 4 of 4
Hypothesis Testing II: One Samplet Tests
INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 1 of 6
What if you don’t know variance?
• t-test allows you to test hypothesisif you don’t know variance
• Sometimes called “small sampletest”: same as z test with enoughobservations
• William Gossett: check that yeastcontent matched Guiness’sstandard (but couldn’t publish)
• I.e., checking whether yeastcontent equal to µ0
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 2 of 6
t-test statistic
• Need to estimate variance
s2 =∑
i
(xi − x̄)2
N −1(1)
• n−1 removes bias (expected value is less than truth)
• Test statistic looks similar
T ≡x̄ −µ0
spN
(2)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 3 of 6
Degrees of Freedom
• Like χ2, t-distribution parameterized by degrees of freedom
• ν= N −1 degress of freedom
Γ�
ν+12
�
pνπΓ
�
ν2
�
�
1 +x2
ν
�− ν+12
(3)
CDF
1
2+ xΓ
�
ν+ 1
2
�
2F1
�
12 , ν+1
2 ; 32 ;− x2
ν
�
pπνΓ
�
ν2
� (4)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 4 of 6
Shape of t-distribution
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 5 of 6
Example
• Suppose observe {0,1,2,3,4,5}• Test whether µ 6= 1
• x̄ = 2.5, s2 = 3.5• T =
x̄−µ0Ç
s2N
= 2.5−1.0q
3.56
= 1.9640
• Double area under the at two tailed CDF
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 6 of 6
Example
• Suppose observe {0,1,2,3,4,5}• Test whether µ 6= 1• x̄ = 2.5, s2 = 3.5
• T =x̄−µ0Ç
s2N
= 2.5−1.0q
3.56
= 1.9640
• Double area under the at two tailed CDF
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 6 of 6
Example
• Suppose observe {0,1,2,3,4,5}• Test whether µ 6= 1• x̄ = 2.5, s2 = 3.5• T =
x̄−µ0Ç
s2N
= 2.5−1.0q
3.56
= 1.9640
• Double area under the at two tailed CDF
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 6 of 6
Example
• Suppose observe {0,1,2,3,4,5}• Test whether µ 6= 1• x̄ = 2.5, s2 = 3.5• T =
x̄−µ0Ç
s2N
= 2.5−1.0q
3.56
= 1.9640
• Double area under the at two tailed CDF
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: One Sample t Tests | 6 of 6
Hypothesis Testing II: Two Samplet Tests
INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 1 of 5
Comparing Two Samples
• Thus far, we’ve tested whether data is consistent with mean µ
• What if we want to test whether two samples are from the samedistribution
• Two-Sample t-test
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 2 of 5
Comparing Two Samples
• Thus far, we’ve tested whether data is consistent with mean µ
• What if we want to test whether two samples are from the samedistribution
• Two-Sample t-test
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 2 of 5
Two-Sample (unpooled)
• Two samples X1 = {x1,1,x1,2 . . .x1,N1} and X2 = {x2,1,x2,2 . . .x2,N2
}• Doesn’t assume that variance is the same for both samples (unpooled)
• Compute mean and sample variance for sample 1 (x̄1,s21) and sample 2
(x̄2,s22)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 3 of 5
Test Statistic
• T-statistic
T =(x1− x2)r
s21
N1+
s22
N2
(1)
• Plug into t-distrubtion with
ν=
�
s21
N1+
s22
N2
�2
�
s21
N1
�2
N1−1 +
�
s22
N2
�2
N2−1
(2)
• Intuition: Difference between x̄1 and x̄2 has variance that’s aninterpolation between the two samples
• Two-tailed vs. one-tailed distinction still applies
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 4 of 5
Test Statistic
• T-statistic
T =(x1− x2)r
s21
N1+
s22
N2
(1)
• Plug into t-distrubtion with
ν=
�
s21
N1+
s22
N2
�2
�
s21
N1
�2
N1−1 +
�
s22
N2
�2
N2−1
(2)
• Intuition: Difference between x̄1 and x̄2 has variance that’s aninterpolation between the two samples
• Two-tailed vs. one-tailed distinction still applies
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 4 of 5
ν Example
s21 = 1,s2
2 = 2,n1 = 4,n2 = 8
ν=
�
s21
N1+
s22
N2
�2
�
s21
N1
�2
N1−1 +
�
s22
N2
�2
N2−1
(3)
(4)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 5 of 5
ν Example
s21 = 1,s2
2 = 2,n1 = 4,n2 = 8
ν=
�
s21
N1+
s22
N2
�2
�
s21
N1
�2
N1−1 +
�
s22
N2
�2
N2−1
(3)
=
�
14 + 2
8
�2
13
�
14
�2+ 1
7
�
28
�2 (4)
(5)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 5 of 5
ν Example
s21 = 1,s2
2 = 2,n1 = 4,n2 = 8
ν=
�
s21
N1+
s22
N2
�2
�
s21
N1
�2
N1−1 +
�
s22
N2
�2
N2−1
(3)
=
�
14 + 2
8
�2
13
�
14
�2+ 1
7
�
28
�2 (4)
=14
�
14
�2 � 13 + 1
7
�=
41021
=42
5(5)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II: Two Sample t Tests | 5 of 5
Hypothesis Testing II
INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberAPRIL 24, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II | 1 of 4
Details
• Important to pre-register hypothesis
• Check assumptions of tests: distribution, randomness, response
• Many other tests that are possible
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II | 2 of 4
Problems with Statistical Tests’ Philosophy
• Failing to reject the null does not prove the null
• Cannot exclude data you don’t like
• Confusing statistical significance with practical significance (µ 6= 0 couldmean µ= 0.000001)
Often better:
• Present plot with confidence intervals, let people decide for themselves
• Create comprehensive model with explainable parameters, computeintervals for parameters
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II | 3 of 4
Problems with Statistical Tests’ Philosophy
• Failing to reject the null does not prove the null
• Cannot exclude data you don’t like
• Confusing statistical significance with practical significance (µ 6= 0 couldmean µ= 0.000001)
Often better:
• Present plot with confidence intervals, let people decide for themselves
• Create comprehensive model with explainable parameters, computeintervals for parameters
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Hypothesis Testing II | 3 of 4