some preliminary q&aph7440/pubh7440/slide1.pdf · some preliminary q&a how does it work? a...
TRANSCRIPT
Some preliminary Q&A
What is the philosophical difference between classical(“frequentist”) and Bayesian statistics?
Chapter 1: Approaches for Statistical Inference – p. 1/19
Some preliminary Q&A
What is the philosophical difference between classical(“frequentist”) and Bayesian statistics?
To a frequentist, unknown model parameters arefixed and unknown, and only estimable byreplications of data from some experiment.
Chapter 1: Approaches for Statistical Inference – p. 1/19
Some preliminary Q&A
What is the philosophical difference between classical(“frequentist”) and Bayesian statistics?
To a frequentist, unknown model parameters arefixed and unknown, and only estimable byreplications of data from some experiment.A Bayesian thinks of parameters as random, andthus having distributions (just like the data). We canthus think about unknowns for which no reliablefrequentist experiment exists, e.g.
θ = proportion of US men withuntreated atrial fibrillation
Chapter 1: Approaches for Statistical Inference – p. 1/19
Some preliminary Q&A
How does it work?
Chapter 1: Approaches for Statistical Inference – p. 2/19
Some preliminary Q&A
How does it work?A Bayesian writes down a prior guess for θ, p(θ),then combines this with the information that the dataX provide to obtain the posterior distribution of θ,p(θ|X). All statistical inferences (point and intervalestimates, hypothesis tests) then follow asappropriate summaries of the posterior.
Chapter 1: Approaches for Statistical Inference – p. 2/19
Some preliminary Q&A
How does it work?A Bayesian writes down a prior guess for θ, p(θ),then combines this with the information that the dataX provide to obtain the posterior distribution of θ,p(θ|X). All statistical inferences (point and intervalestimates, hypothesis tests) then follow asappropriate summaries of the posterior.Note that
posterior information ≥ prior information ≥ 0 ,
with the second “≥” replaced by “=” only if the prior isnoninformative (which is often uniform, or “flat”).
Chapter 1: Approaches for Statistical Inference – p. 2/19
Some preliminary Q&A
Is the classical approach “wrong”?
Chapter 1: Approaches for Statistical Inference – p. 3/19
Some preliminary Q&A
Is the classical approach “wrong”?While a “hardcore” Bayesian might say so, it isprobably more accurate to think of classical methodsas merely “limited in scope”!
Chapter 1: Approaches for Statistical Inference – p. 3/19
Some preliminary Q&A
Is the classical approach “wrong”?While a “hardcore” Bayesian might say so, it isprobably more accurate to think of classical methodsas merely “limited in scope”!The Bayesian approach expands the class of modelswe can fit to our data, enabling us to handle
repeated measuresunbalanced or missing datanonhomogenous variancesmultivariate data
– and many other settings that are awkward orinfeasible from a classical point of view.
Chapter 1: Approaches for Statistical Inference – p. 3/19
Some preliminary Q&A
Is the classical approach “wrong”?While a “hardcore” Bayesian might say so, it isprobably more accurate to think of classical methodsas merely “limited in scope”!The Bayesian approach expands the class of modelswe can fit to our data, enabling us to handle
repeated measuresunbalanced or missing datanonhomogenous variancesmultivariate data
– and many other settings that are awkward orinfeasible from a classical point of view.The approach also eases the interpretation of andlearning from those models once fit.
Chapter 1: Approaches for Statistical Inference – p. 3/19
Simple example of Bayesian thinking
From Business Week, online edition, July 31, 2001:
Chapter 1: Approaches for Statistical Inference – p. 4/19
Simple example of Bayesian thinking
From Business Week, online edition, July 31, 2001:“Economists might note, to take a simpleexample, that American turkey consumptiontends to increase in November. A Bayesian wouldclarify this by observing that Thanksgiving occursin this month.”
Chapter 1: Approaches for Statistical Inference – p. 4/19
Simple example of Bayesian thinking
From Business Week, online edition, July 31, 2001:“Economists might note, to take a simpleexample, that American turkey consumptiontends to increase in November. A Bayesian wouldclarify this by observing that Thanksgiving occursin this month.”
Data: plot of turkey consumption by month
Chapter 1: Approaches for Statistical Inference – p. 4/19
Simple example of Bayesian thinking
From Business Week, online edition, July 31, 2001:“Economists might note, to take a simpleexample, that American turkey consumptiontends to increase in November. A Bayesian wouldclarify this by observing that Thanksgiving occursin this month.”
Data: plot of turkey consumption by month
Prior:location of Thanksgiving in the calendarknowledge of Americans’ Thanksgiving eating habits
Chapter 1: Approaches for Statistical Inference – p. 4/19
Simple example of Bayesian thinking
From Business Week, online edition, July 31, 2001:“Economists might note, to take a simpleexample, that American turkey consumptiontends to increase in November. A Bayesian wouldclarify this by observing that Thanksgiving occursin this month.”
Data: plot of turkey consumption by month
Prior:location of Thanksgiving in the calendarknowledge of Americans’ Thanksgiving eating habits
Posterior: Understanding of the pattern in the data!
Chapter 1: Approaches for Statistical Inference – p. 4/19
Bayes means revision of estimates
Humans tend to be Bayesian in the sense that most revisetheir opinions about uncertain quantities as more dataaccumulate. For example:
Suppose you are about to make your first submission toa particular academic journal
Chapter 1: Approaches for Statistical Inference – p. 5/19
Bayes means revision of estimates
Humans tend to be Bayesian in the sense that most revisetheir opinions about uncertain quantities as more dataaccumulate. For example:
Suppose you are about to make your first submission toa particular academic journal
You assess your chances of your paper being accepted(you have an opinion, but the “true” probability isunknown)
Chapter 1: Approaches for Statistical Inference – p. 5/19
Bayes means revision of estimates
Humans tend to be Bayesian in the sense that most revisetheir opinions about uncertain quantities as more dataaccumulate. For example:
Suppose you are about to make your first submission toa particular academic journal
You assess your chances of your paper being accepted(you have an opinion, but the “true” probability isunknown)
You submit your article and it is accepted!
Chapter 1: Approaches for Statistical Inference – p. 5/19
Bayes means revision of estimates
Humans tend to be Bayesian in the sense that most revisetheir opinions about uncertain quantities as more dataaccumulate. For example:
Suppose you are about to make your first submission toa particular academic journal
You assess your chances of your paper being accepted(you have an opinion, but the “true” probability isunknown)
You submit your article and it is accepted!
Question: What is your revised opinion regarding theacceptance probability for papers like yours?
Chapter 1: Approaches for Statistical Inference – p. 5/19
Bayes means revision of estimates
Humans tend to be Bayesian in the sense that most revisetheir opinions about uncertain quantities as more dataaccumulate. For example:
Suppose you are about to make your first submission toa particular academic journal
You assess your chances of your paper being accepted(you have an opinion, but the “true” probability isunknown)
You submit your article and it is accepted!
Question: What is your revised opinion regarding theacceptance probability for papers like yours?
If you said anything other than “1”, you are a Bayesian!
Chapter 1: Approaches for Statistical Inference – p. 5/19
Bayes can account for structureCounty-level breast cancer rates per 10,000 women:
79 87 83 80 7890 89 92 99 9596 100 ⋆ 110 115101 109 105 108 11296 104 92 101 96
With no direct data for ⋆, what estimate would you use?
Chapter 1: Approaches for Statistical Inference – p. 6/19
Bayes can account for structureCounty-level breast cancer rates per 10,000 women:
79 87 83 80 7890 89 92 99 9596 100 ⋆ 110 115101 109 105 108 11296 104 92 101 96
With no direct data for ⋆, what estimate would you use?
Is 200 reasonable?
Chapter 1: Approaches for Statistical Inference – p. 6/19
Bayes can account for structureCounty-level breast cancer rates per 10,000 women:
79 87 83 80 7890 89 92 99 9596 100 ⋆ 110 115101 109 105 108 11296 104 92 101 96
With no direct data for ⋆, what estimate would you use?
Is 200 reasonable?
Probably not: all the other rates are around 100
Chapter 1: Approaches for Statistical Inference – p. 6/19
Bayes can account for structureCounty-level breast cancer rates per 10,000 women:
79 87 83 80 7890 89 92 99 9596 100 ⋆ 110 115101 109 105 108 11296 104 92 101 96
With no direct data for ⋆, what estimate would you use?
Is 200 reasonable?
Probably not: all the other rates are around 100
Perhaps use the average of the “neighboring” values(again, near 100)
Chapter 1: Approaches for Statistical Inference – p. 6/19
Accounting for structure (cont’d)Now assume that data become available for county ⋆:100 women at risk, 2 cancer cases. Thus
rate =2
100× 10, 000 = 200
Would you use this value as the estimate?
Chapter 1: Approaches for Statistical Inference – p. 7/19
Accounting for structure (cont’d)Now assume that data become available for county ⋆:100 women at risk, 2 cancer cases. Thus
rate =2
100× 10, 000 = 200
Would you use this value as the estimate?
Probably not: The sample size is very small, so thisestimate will be unreliable. How about a compromisebetween 200 and the rates in the neighboring counties?
Chapter 1: Approaches for Statistical Inference – p. 7/19
Accounting for structure (cont’d)Now assume that data become available for county ⋆:100 women at risk, 2 cancer cases. Thus
rate =2
100× 10, 000 = 200
Would you use this value as the estimate?
Probably not: The sample size is very small, so thisestimate will be unreliable. How about a compromisebetween 200 and the rates in the neighboring counties?
Now repeat this thought experiment if the county ⋆ datawere 20/1000, 200/10000, ...
Chapter 1: Approaches for Statistical Inference – p. 7/19
Accounting for structure (cont’d)Now assume that data become available for county ⋆:100 women at risk, 2 cancer cases. Thus
rate =2
100× 10, 000 = 200
Would you use this value as the estimate?
Probably not: The sample size is very small, so thisestimate will be unreliable. How about a compromisebetween 200 and the rates in the neighboring counties?
Now repeat this thought experiment if the county ⋆ datawere 20/1000, 200/10000, ...
Bayes and empirical Bayes methods can incorporatethe structure in the data, weight the data and priorinformation appropriately, and allow the data todominate as the sample size becomes large.
Chapter 1: Approaches for Statistical Inference – p. 7/19
Motivating Example
From Berger and Berry (1988, Amer. Scientist):Consider a clinical trial to study the effectiveness ofVitamin C in treating the common cold.
Chapter 1: Approaches for Statistical Inference – p. 8/19
Motivating Example
From Berger and Berry (1988, Amer. Scientist):Consider a clinical trial to study the effectiveness ofVitamin C in treating the common cold.
Observations are matched pairs of subjects (twins?),half randomized (in “double blind” fashion) to vitamin C,half to placebo. We count how many pairs had C givingsuperior relief after 48 hours.
Chapter 1: Approaches for Statistical Inference – p. 8/19
Two DesignsDesign #1: Sample n = 17 pairs, and test
H0 : P (C better) =1
2vs. HA : P (C better) 6= 1
2
Suppose we observe x = 13 preferences for C. Then
p-value = P (X ≥ 13 or X ≤ 4) = .049
So if α = .05, stop and reject H0.
Chapter 1: Approaches for Statistical Inference – p. 9/19
Two DesignsDesign #1: Sample n = 17 pairs, and test
H0 : P (C better) =1
2vs. HA : P (C better) 6= 1
2
Suppose we observe x = 13 preferences for C. Then
p-value = P (X ≥ 13 or X ≤ 4) = .049
So if α = .05, stop and reject H0.
Design #2: Sample n1 = 17 pairs. Then:
if x1 ≥ 13 or x1 ≤ 4, stop.otherwise, sample an additional n2 = 27 pairs.
Reject H0 if X1 + X2 ≥ 29 or X1 + X2 ≤ 15.
Chapter 1: Approaches for Statistical Inference – p. 9/19
Two Designs (cont’d)
We choose this second stage since under H0,P (X1 + X2 ≥ 29 or X1 + X2 ≤ 15) = .049– the same as Stage 1!
Chapter 1: Approaches for Statistical Inference – p. 10/19
Two Designs (cont’d)
We choose this second stage since under H0,P (X1 + X2 ≥ 29 or X1 + X2 ≤ 15) = .049– the same as Stage 1!
Suppose we again observe X1 = 13. Now:
p-value = P (X1 ≥ 13 or X1 ≤ 4)
+P (X1 + X2 ≥ 29 and 4 < X1 < 13)
+P (X1 + X2 ≤ 15 and 4 < X1 < 13)
= .085 ← no longer significant at α = .05!
Chapter 1: Approaches for Statistical Inference – p. 10/19
Two Designs (cont’d)
We choose this second stage since under H0,P (X1 + X2 ≥ 29 or X1 + X2 ≤ 15) = .049– the same as Stage 1!
Suppose we again observe X1 = 13. Now:
p-value = P (X1 ≥ 13 or X1 ≤ 4)
+P (X1 + X2 ≥ 29 and 4 < X1 < 13)
+P (X1 + X2 ≤ 15 and 4 < X1 < 13)
= .085 ← no longer significant at α = .05!
Yet the observed data was exactly the same; all we didwas contemplate a second stage (no effect on data),and it changed our answer!
Chapter 1: Approaches for Statistical Inference – p. 10/19
Additional Q & A
Q: What if we kept adding stages?
Chapter 1: Approaches for Statistical Inference – p. 11/19
Additional Q & A
Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!
Chapter 1: Approaches for Statistical Inference – p. 11/19
Additional Q & A
Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!
Q: So are p-values really “objective evidence”?
Chapter 1: Approaches for Statistical Inference – p. 11/19
Additional Q & A
Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!
Q: So are p-values really “objective evidence”?A: No, since extra info (like design) critical!
Chapter 1: Approaches for Statistical Inference – p. 11/19
Additional Q & A
Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!
Q: So are p-values really “objective evidence”?A: No, since extra info (like design) critical!
Q: What about unforeseen events?Example: First 5 patients develop an allergic reaction tothe treatment – trial is stopped by clinicians.Can a frequentist analyze these data?
Chapter 1: Approaches for Statistical Inference – p. 11/19
Additional Q & A
Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!
Q: So are p-values really “objective evidence”?A: No, since extra info (like design) critical!
Q: What about unforeseen events?Example: First 5 patients develop an allergic reaction tothe treatment – trial is stopped by clinicians.Can a frequentist analyze these data?
A: No: This aspect of design wasn’t anticipated, sop-values not computable!
Chapter 1: Approaches for Statistical Inference – p. 11/19
Additional Q & A
Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!
Q: So are p-values really “objective evidence”?A: No, since extra info (like design) critical!
Q: What about unforeseen events?Example: First 5 patients develop an allergic reaction tothe treatment – trial is stopped by clinicians.Can a frequentist analyze these data?
A: No: This aspect of design wasn’t anticipated, sop-values not computable!
Q: Can a Bayesian?
Chapter 1: Approaches for Statistical Inference – p. 11/19
Additional Q & A
Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!
Q: So are p-values really “objective evidence”?A: No, since extra info (like design) critical!
Q: What about unforeseen events?Example: First 5 patients develop an allergic reaction tothe treatment – trial is stopped by clinicians.Can a frequentist analyze these data?
A: No: This aspect of design wasn’t anticipated, sop-values not computable!
Q: Can a Bayesian?A: (obviously) Yes – as we shall see.....
Chapter 1: Approaches for Statistical Inference – p. 11/19
Bayes/frequentist controversy
Frequentist outlook: Suppose we have k unknownquantities θ = (θ1, θ2, . . . , θk), and data X = (X1, ..., Xn)with distribution which depends on θ, say p(x|θ).
Chapter 1: Approaches for Statistical Inference – p. 12/19
Bayes/frequentist controversy
Frequentist outlook: Suppose we have k unknownquantities θ = (θ1, θ2, . . . , θk), and data X = (X1, ..., Xn)with distribution which depends on θ, say p(x|θ).
The frequentist selects a loss function, L(θ, a), anddevelops procedures which perform well with respect tofrequentist risk,
R(θ, δ) = EX|θ[L(θ, a)]
Chapter 1: Approaches for Statistical Inference – p. 12/19
Bayes/frequentist controversy
Frequentist outlook: Suppose we have k unknownquantities θ = (θ1, θ2, . . . , θk), and data X = (X1, ..., Xn)with distribution which depends on θ, say p(x|θ).
The frequentist selects a loss function, L(θ, a), anddevelops procedures which perform well with respect tofrequentist risk,
R(θ, δ) = EX|θ[L(θ, a)]
A good frequentist procedure is one that enjoys low lossover repeated data sampling regardless of the truevalue of θ.
Chapter 1: Approaches for Statistical Inference – p. 12/19
Bayes/frequentist controversySo are there any such procedures? Sure:
Example: Suppose Xi|θ iid∼ N(θ, σ2), i = 1, . . . , n.Standard 95% frequentist confidence interval (CI) isδ(x) = (x̄± 1.96 s/
√n).
Chapter 1: Approaches for Statistical Inference – p. 13/19
Bayes/frequentist controversySo are there any such procedures? Sure:
Example: Suppose Xi|θ iid∼ N(θ, σ2), i = 1, . . . , n.Standard 95% frequentist confidence interval (CI) isδ(x) = (x̄± 1.96 s/
√n).
If we choose the loss function
L(θ, δ) =
{
0 if θ ∈ δ(x)
1 if θ 6∈ δ(x), then
R[(θ, σ), δ] = Ex|θ,σ[L(θ, δ(x))] = Pθ,σ(θ 6∈ δ(x)) = .05
Chapter 1: Approaches for Statistical Inference – p. 13/19
Bayes/frequentist controversySo are there any such procedures? Sure:
Example: Suppose Xi|θ iid∼ N(θ, σ2), i = 1, . . . , n.Standard 95% frequentist confidence interval (CI) isδ(x) = (x̄± 1.96 s/
√n).
If we choose the loss function
L(θ, δ) =
{
0 if θ ∈ δ(x)
1 if θ 6∈ δ(x), then
R[(θ, σ), δ] = Ex|θ,σ[L(θ, δ(x))] = Pθ,σ(θ 6∈ δ(x)) = .05
On average in repeated use, δ fails only 5% of the time.
Chapter 1: Approaches for Statistical Inference – p. 13/19
Bayes/frequentist controversySo are there any such procedures? Sure:
Example: Suppose Xi|θ iid∼ N(θ, σ2), i = 1, . . . , n.Standard 95% frequentist confidence interval (CI) isδ(x) = (x̄± 1.96 s/
√n).
If we choose the loss function
L(θ, δ) =
{
0 if θ ∈ δ(x)
1 if θ 6∈ δ(x), then
R[(θ, σ), δ] = Ex|θ,σ[L(θ, δ(x))] = Pθ,σ(θ 6∈ δ(x)) = .05
On average in repeated use, δ fails only 5% of the time.
Has some appeal, since it works for all θ and σ!
Chapter 1: Approaches for Statistical Inference – p. 13/19
Bayes/frequentist controversy
Many frequentist analyses based on the likelihoodfunction, which is just p(x|θ) viewed as a function of θ:
L(θ;x) ≡ p(x|θ)
Larger p(x|θ) for values of θ which are more “likely”.Leads naturally to...
Chapter 1: Approaches for Statistical Inference – p. 14/19
Bayes/frequentist controversy
Many frequentist analyses based on the likelihoodfunction, which is just p(x|θ) viewed as a function of θ:
L(θ;x) ≡ p(x|θ)
Larger p(x|θ) for values of θ which are more “likely”.Leads naturally to...
The Likelihood Principle: In making inferences ordecisions about θ after x is observed, all relevantexperimental information is contained in the likelihoodfunction for the observed x. Furthermore, two likelihoodfunctions contain the same information about θ if theyare proportional to each other as functions of θ.
Chapter 1: Approaches for Statistical Inference – p. 14/19
Bayes/frequentist controversy
Many frequentist analyses based on the likelihoodfunction, which is just p(x|θ) viewed as a function of θ:
L(θ;x) ≡ p(x|θ)
Larger p(x|θ) for values of θ which are more “likely”.Leads naturally to...
The Likelihood Principle: In making inferences ordecisions about θ after x is observed, all relevantexperimental information is contained in the likelihoodfunction for the observed x. Furthermore, two likelihoodfunctions contain the same information about θ if theyare proportional to each other as functions of θ.
Seems completely reasonable (and theoreticallyjustifiable) – yet frequentist analyses may violate!...
Chapter 1: Approaches for Statistical Inference – p. 14/19
Binomial vs. Negative BinomialExample due to Pratt (comment on Birnbaum, 1962 JASA):Suppose 12 independent coin tosses: 9H, 3T. Test:
H0 : θ = 1
2vs. HA : θ > 1
2
Two possibilities for f(x|θ):
Chapter 1: Approaches for Statistical Inference – p. 15/19
Binomial vs. Negative BinomialExample due to Pratt (comment on Birnbaum, 1962 JASA):Suppose 12 independent coin tosses: 9H, 3T. Test:
H0 : θ = 1
2vs. HA : θ > 1
2
Two possibilities for f(x|θ):Binomial: n = 12 tosses (fixed beforehand)⇒ X = #H ∼ Bin(12, θ)
⇒ L1(θ) = p1(x|θ) =(
nx
)
θx(1− θ)n−x
=(
12
9
)
θ9(1− θ)3.
Chapter 1: Approaches for Statistical Inference – p. 15/19
Binomial vs. Negative BinomialExample due to Pratt (comment on Birnbaum, 1962 JASA):Suppose 12 independent coin tosses: 9H, 3T. Test:
H0 : θ = 1
2vs. HA : θ > 1
2
Two possibilities for f(x|θ):Binomial: n = 12 tosses (fixed beforehand)⇒ X = #H ∼ Bin(12, θ)
⇒ L1(θ) = p1(x|θ) =(
nx
)
θx(1− θ)n−x
=(
12
9
)
θ9(1− θ)3.
Negative Binomial: Flip until we get r = 3 tails⇒ X ∼ NB(3, θ)
⇒ L2(θ) = p2(x|θ) =(
r+x−1
x
)
θx(1− θ)r
=(
11
9
)
θ9(1− θ)3.
Chapter 1: Approaches for Statistical Inference – p. 15/19
Binomial vs. Negative BinomialExample due to Pratt (comment on Birnbaum, 1962 JASA):Suppose 12 independent coin tosses: 9H, 3T. Test:
H0 : θ = 1
2vs. HA : θ > 1
2
Two possibilities for f(x|θ):Binomial: n = 12 tosses (fixed beforehand)⇒ X = #H ∼ Bin(12, θ)
⇒ L1(θ) = p1(x|θ) =(
nx
)
θx(1− θ)n−x
=(
12
9
)
θ9(1− θ)3.
Negative Binomial: Flip until we get r = 3 tails⇒ X ∼ NB(3, θ)
⇒ L2(θ) = p2(x|θ) =(
r+x−1
x
)
θx(1− θ)r
=(
11
9
)
θ9(1− θ)3.
Adopt the rejection region, “Reject H0 if X ≥ c.”
Chapter 1: Approaches for Statistical Inference – p. 15/19
Binomial vs. Negative Binomialp-values:
Chapter 1: Approaches for Statistical Inference – p. 16/19
Binomial vs. Negative Binomialp-values:
α1 = Pθ= 1
2
(X ≥ 9) = Σ12j=9
(
12
j
)
θj(1− θ)12−j = .075
Chapter 1: Approaches for Statistical Inference – p. 16/19
Binomial vs. Negative Binomialp-values:
α1 = Pθ= 1
2
(X ≥ 9) = Σ12j=9
(
12
j
)
θj(1− θ)12−j = .075
α2 = Pθ= 1
2
(X ≥ 9) = Σ∞j=9
(
2+jj
)
θj(1− θ)3 = .0325
Chapter 1: Approaches for Statistical Inference – p. 16/19
Binomial vs. Negative Binomialp-values:
α1 = Pθ= 1
2
(X ≥ 9) = Σ12j=9
(
12
j
)
θj(1− θ)12−j = .075
α2 = Pθ= 1
2
(X ≥ 9) = Σ∞j=9
(
2+jj
)
θj(1− θ)3 = .0325
So at α = .05, two different decisions! Violates theLikelihood Principle, since L1(θ) ∝ L2(θ)!!
Chapter 1: Approaches for Statistical Inference – p. 16/19
Binomial vs. Negative Binomialp-values:
α1 = Pθ= 1
2
(X ≥ 9) = Σ12j=9
(
12
j
)
θj(1− θ)12−j = .075
α2 = Pθ= 1
2
(X ≥ 9) = Σ∞j=9
(
2+jj
)
θj(1− θ)3 = .0325
So at α = .05, two different decisions! Violates theLikelihood Principle, since L1(θ) ∝ L2(θ)!!
What happened? Besides the observed x = 9, we alsotook into account the “more extreme” X ≥ 10.
Chapter 1: Approaches for Statistical Inference – p. 16/19
Binomial vs. Negative Binomialp-values:
α1 = Pθ= 1
2
(X ≥ 9) = Σ12j=9
(
12
j
)
θj(1− θ)12−j = .075
α2 = Pθ= 1
2
(X ≥ 9) = Σ∞j=9
(
2+jj
)
θj(1− θ)3 = .0325
So at α = .05, two different decisions! Violates theLikelihood Principle, since L1(θ) ∝ L2(θ)!!
What happened? Besides the observed x = 9, we alsotook into account the “more extreme” X ≥ 10.
Jeffreys (1961): “...a hypothesis which may be true maybe rejected because it has not predicted observableresults which have not occurred.”
Chapter 1: Approaches for Statistical Inference – p. 16/19
Binomial vs. Negative Binomialp-values:
α1 = Pθ= 1
2
(X ≥ 9) = Σ12j=9
(
12
j
)
θj(1− θ)12−j = .075
α2 = Pθ= 1
2
(X ≥ 9) = Σ∞j=9
(
2+jj
)
θj(1− θ)3 = .0325
So at α = .05, two different decisions! Violates theLikelihood Principle, since L1(θ) ∝ L2(θ)!!
What happened? Besides the observed x = 9, we alsotook into account the “more extreme” X ≥ 10.
Jeffreys (1961): “...a hypothesis which may be true maybe rejected because it has not predicted observableresults which have not occurred.”
In our example, the probability of the unpredicted andnonoccurring set X ≥ 10 has been used as evidenceagainst H0 !
Chapter 1: Approaches for Statistical Inference – p. 16/19
Conditional (Bayesian) Perspective
Always condition on data which has actually occurred;the long-run performance of a procedure is of (at most)secondary interest. Fix a prior distribution p(θ), and useBayes’ Theorem (1763):
p(θ|x) ∝ p(x|θ)p(θ)
(“posterior ∝ likelihood× prior”)
Chapter 1: Approaches for Statistical Inference – p. 17/19
Conditional (Bayesian) Perspective
Always condition on data which has actually occurred;the long-run performance of a procedure is of (at most)secondary interest. Fix a prior distribution p(θ), and useBayes’ Theorem (1763):
p(θ|x) ∝ p(x|θ)p(θ)
(“posterior ∝ likelihood× prior”)
Indeed, it often turns out that using the Bayesianformalism with relatively vague priors producesprocedures which perform well using traditionalfrequentist criteria (e.g., low mean squared error overrepeated sampling)!
Chapter 1: Approaches for Statistical Inference – p. 17/19
Conditional (Bayesian) Perspective
Always condition on data which has actually occurred;the long-run performance of a procedure is of (at most)secondary interest. Fix a prior distribution p(θ), and useBayes’ Theorem (1763):
p(θ|x) ∝ p(x|θ)p(θ)
(“posterior ∝ likelihood× prior”)
Indeed, it often turns out that using the Bayesianformalism with relatively vague priors producesprocedures which perform well using traditionalfrequentist criteria (e.g., low mean squared error overrepeated sampling)!– several examples in Chapter 4 of the C&L text!
Chapter 1: Approaches for Statistical Inference – p. 17/19
Frequentist criticisms
[from B. Efron (1986, Amer. Statist.), “Why isn’t everyone aBayesian?”]
Shouldn’t condition on x (but this renews conflict withthe Likelihood Principle)
Chapter 1: Approaches for Statistical Inference – p. 18/19
Frequentist criticisms
[from B. Efron (1986, Amer. Statist.), “Why isn’t everyone aBayesian?”]
Shouldn’t condition on x (but this renews conflict withthe Likelihood Principle)
Not easy/automatic (but computing keeps improving:MCMC methods, WinBUGS software....)
Chapter 1: Approaches for Statistical Inference – p. 18/19
Frequentist criticisms
[from B. Efron (1986, Amer. Statist.), “Why isn’t everyone aBayesian?”]
Shouldn’t condition on x (but this renews conflict withthe Likelihood Principle)
Not easy/automatic (but computing keeps improving:MCMC methods, WinBUGS software....)
How to pick the prior p(θ): Two experimenters could getdifferent answers with the same data! How to controlinfluence of the prior? How to get objective results (say,for a court case, scientific report,....)?
Chapter 1: Approaches for Statistical Inference – p. 18/19
Frequentist criticisms
[from B. Efron (1986, Amer. Statist.), “Why isn’t everyone aBayesian?”]
Shouldn’t condition on x (but this renews conflict withthe Likelihood Principle)
Not easy/automatic (but computing keeps improving:MCMC methods, WinBUGS software....)
How to pick the prior p(θ): Two experimenters could getdifferent answers with the same data! How to controlinfluence of the prior? How to get objective results (say,for a court case, scientific report,....)?
⇒ Clearly this final criticism is the most serious!
Chapter 1: Approaches for Statistical Inference – p. 18/19
Bayesian Advantages in InferenceAbility to formally incorporate prior information
Chapter 1: Approaches for Statistical Inference – p. 19/19
Bayesian Advantages in InferenceAbility to formally incorporate prior information
The reason for stopping experimentation does not affectthe inference
Chapter 1: Approaches for Statistical Inference – p. 19/19
Bayesian Advantages in InferenceAbility to formally incorporate prior information
The reason for stopping experimentation does not affectthe inference
Answers are more easily interpretable by nonspecialists(e.g. confidence intervals)
Chapter 1: Approaches for Statistical Inference – p. 19/19
Bayesian Advantages in InferenceAbility to formally incorporate prior information
The reason for stopping experimentation does not affectthe inference
Answers are more easily interpretable by nonspecialists(e.g. confidence intervals)
All analyses follow directly from the posterior; noseparate theories of estimation, testing, multiplecomparisons, etc. are needed
Chapter 1: Approaches for Statistical Inference – p. 19/19
Bayesian Advantages in InferenceAbility to formally incorporate prior information
The reason for stopping experimentation does not affectthe inference
Answers are more easily interpretable by nonspecialists(e.g. confidence intervals)
All analyses follow directly from the posterior; noseparate theories of estimation, testing, multiplecomparisons, etc. are needed
Any question can be directly answered (bioequivalence,multiple comparisons/hypotheses, ...)
Chapter 1: Approaches for Statistical Inference – p. 19/19
Bayesian Advantages in InferenceAbility to formally incorporate prior information
The reason for stopping experimentation does not affectthe inference
Answers are more easily interpretable by nonspecialists(e.g. confidence intervals)
All analyses follow directly from the posterior; noseparate theories of estimation, testing, multiplecomparisons, etc. are needed
Any question can be directly answered (bioequivalence,multiple comparisons/hypotheses, ...)
Inferences are conditional on the actual data
Chapter 1: Approaches for Statistical Inference – p. 19/19
Bayesian Advantages in InferenceAbility to formally incorporate prior information
The reason for stopping experimentation does not affectthe inference
Answers are more easily interpretable by nonspecialists(e.g. confidence intervals)
All analyses follow directly from the posterior; noseparate theories of estimation, testing, multiplecomparisons, etc. are needed
Any question can be directly answered (bioequivalence,multiple comparisons/hypotheses, ...)
Inferences are conditional on the actual data
Bayes procedures possess many optimality properties(e.g. consistent, impose parsimony in model choice,define the class of optimal frequentist procedures, ...)
Chapter 1: Approaches for Statistical Inference – p. 19/19