lecture 5 - university of oxfordbiller/statistics_files/... · lecture 5: • bayesian &...

7
Lecture 5: Bayesian & Frequentist Probabilities Bayesian Confidence Intervals Frequentist Confidence Intervals What to use? Parameter 1 Parameter 2 68.3% CL 90% CL 99% CL A B C Consider a single experiment in which 2 parameters are measured ( ) and compared with predictions from 3 different theoretical models (A, B, C) Bayesian: Degree of belief. Given a single measurement, seek to constrain the phase space of possible models. Requires an assumed context for these models (prior). Probabilities are subjective and open to revision. There is no relevance to “statistical coverage of a confidence interval,” because there is only one measurement. Frequentist: Frequency of occurrence given a hypothetical ensemble of identical experiments. A single measurement says nothing about the validity of a model (and venturing into model space is strictly forbidden!). There is no such thing as a “probability” for a model parameter to lie within derived bounds - either it does or it doesn’t. However, if everyone played the same game, the correct model would be correctly bounded a known fraction of the time. Different Definitions of Probability: Your brain inherently makes Bayesian inferences: Context is necessary to relate data to model parameters (light seen) (surface absorption properties) Prior: H o w a r e the squares likely being illuminated? The model is of central importance to enable predictions

Upload: others

Post on 11-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 5 - University of Oxfordbiller/Statistics_files/... · Lecture 5: • Bayesian & Frequentist Probabilities ... 1 0-1-2-3-4-5 If everyone plays this game, then the true parameter

Lecture 5:• Bayesian & Frequentist Probabilities• Bayesian Confidence Intervals• Frequentist Confidence Intervals• What to use? Pa

ram

eter

1

Parameter 2

68.3% CL

90% CL

99% CL

A

B

C

Consider a single experiment in which 2 parameters are measured ( ) and compared with predictions from 3 different theoretical models (A, B, C)

Bayesian: Degree of belief. Given a single measurement, seek to constrain the phase space of possible models. Requires an assumed context for these models (prior). Probabilities are subjective and open to revision. There is no relevance to “statistical coverage of a confidence interval,” because there is only one measurement.

Frequentist: Frequency of occurrence given a hypothetical ensemble of identical experiments. A single measurement says nothing about the validity of a model (and venturing into model space is strictly forbidden!). There is no such thing as a “probability” for a model parameter to lie within derived bounds - either it does or it doesn’t. However, if everyone played the same game, the correct model would be correctly bounded a known fraction of the time.

Different Definitions of Probability: Your brain inherently makes Bayesian inferences:

Context is necessary to relate data to model parameters(light seen) (surface absorption properties)

Prior: H o w a r e the squares likely being illuminated?

The model is of central importance to enable predictions

Page 2: Lecture 5 - University of Oxfordbiller/Statistics_files/... · Lecture 5: • Bayesian & Frequentist Probabilities ... 1 0-1-2-3-4-5 If everyone plays this game, then the true parameter

Charged particles produce light as they pass through plastic scintillators, which can be detected by photomultiplier tubes and used as an estimator for the energy deposition. Say that that you calibrate such an instrument using known gamma line energies from various radioactive sources and determine that the energy can be very well described by taking the mean number (N) of detected photons (drawn from a Gaussian distribution of width σ) and multiplying it a proportionality constant, α.

Now you measure emission from some continuous spectrum and detect No photons from an interaction. What is the best estimate of the gamma ray energy?

Relating data to model parameters requires a context (i.e a prior)!

Another example:

energy

rate

Fluctuations into the No region from higher and lower energiesare equal and unbiased:

E ~ αNo

energy range s a m p l e d b y ±1σ interval

i.e. energy resolution

energy

rate

Fluctuations into the No region from lower energies are more likely (there are more chances):

E < αNo

i.e. there are more low energy events in the resolution bin

energy

rate E < αNoE > αNo

E ~ αNo

Different biases in the different regions of the spectrum

a b

PDF for model parameter of interest, “α”

αa αb

a = fraction of PDF < αa

b = fraction of PDF > αb

CL = 1- a - b

90% CL central interval: find αa and αb such that

a = b = 0.05

90% CL upper bound: find αa and αb such that

a = 0, b = 0.1

90% CL lower bound: find αa and αb such that

a = 0.1, b = 0

Bayesian Confidence Intervals

Confidence Level:

Example: Find the Bayesian 98% CL upper bound on the mean signal strength, S, for a counting experiment where the expected background level is B and a total of n events are observed.

Z Sup

�1

(S+B)ne�(S+B)

n! H(S)R +1�1

(S0+B)ne�(S0+B)

n! H(S0)dS0dS = 0.98

Likelihood Prior

Normalisation

PDF for signal from Bayes’ Theorem

We’ll assume there is no a priori reason why all values of S shouldn’t be considered equally likely, aside from the fact that it must be non-negative. So, take the prior to be the Heaviside step function (zero for S<0 and 1 otherwise)

Then just solve for Sup

Conveniently, this turns out to be mathematically identical to:

renormalises allowed range of background counts (which must be less than or equal to n)

Pnm=0

(Sup+B)me�(Sup+B)

m!Pnm=0

Bme�B

m!

= 1� 0.98

Page 3: Lecture 5 - University of Oxfordbiller/Statistics_files/... · Lecture 5: • Bayesian & Frequentist Probabilities ... 1 0-1-2-3-4-5 If everyone plays this game, then the true parameter

“I don’t like the idea of priors, so I just want to talk about constraints in a way that doesn’t use them.”

What do you mean by ‘constraints?’ If you want to constrain a model with your measurement, then this is the maths... so suck it up, dude!

“What prior do you use? It’s completely arbitrary!”1) Priors are a convenient way to put in known physical constraints which are well defined (e.g. masses must be greater than zero, the position of a detected events cannot be outside of the detector, etc.).

2) Priors within the physically allowed region are less rigorously defined (Baysian probabilities are subjective), but their basis should be apparent and defensible (“Fair bet” test) - i.e. not completely arbitrary! For instance, if there is not a very strong a priori reason why one model is more likely than another, you should probably give them equal weight.

3) In practice, the exact form of the prior generally makes very little difference: You usually compare models that are close to each other, where priors are pretty flat, and the likelihood function crushes the impact of the prior away from the region of interest. Hence, the impact of priors approaches zero in the limit of large statistics (Berstein - von Mises). In any case, the robustness of conclusions can always be tested.

“Should I then use the outcome of previous experiments as part of the prior?”

Careful!! Yes for other experiments that you have performed (e.g. calibrations) to assess certain aspects of detector performance or related data that can be regarded as unimpeachable. Otherwise, generally not:

1) To do it properly would require a detailed knowledge of the other experiment and the associated full PDF (usually not available); 2) Systematic uncertainties associated with individual experiments are difficult to quantify. This is why each experiment should stand on its own and be independently cross-checked by other experiments.

a b

Likelihood distribution of X assuming the model α1

Xa Xb

a b

Likelihood distribution of X assuming model α2

Xa Xb

a b

Likelihood distribution of X assuming model α3

Xa Xb

... etc.

Neyman Construction of Frequentist Confidence Intervals

as before:CL = 1- a - b

(except here “Confidence Level” refers to the frequency of measurements for a given model)

-5 -4 -3 -2 -1 0 1 2 3 4 5

5 4 3 2 1 0 -1 -2 -3 -4 -5

Mod

el P

aram

eter

of I

nter

est

(α)

Measurement (X)

Neyman Construction of Frequentist Confidence Intervals

Assume that the measurement X is an unbiased est imator for t h e m o d e l parameter α

(CL =1-a-b)Xa Xb

Page 4: Lecture 5 - University of Oxfordbiller/Statistics_files/... · Lecture 5: • Bayesian & Frequentist Probabilities ... 1 0-1-2-3-4-5 If everyone plays this game, then the true parameter

The range of model parameter values for which the measurement is “likely” (i.e. would be contained within a CL frequency interval)

-5 -4 -3 -2 -1 0 1 2 3 4 5

5 4 3 2 1 0 -1 -2 -3 -4 -5

Neyman Construction of Frequentist Confidence Intervals

Mod

el P

aram

eter

of I

nter

est

(α)

Measurement (X)

Assume that the measurement X is an unbiased est imator for t h e m o d e l parameter α

-5 -4 -3 -2 -1 0 1 2 3 4 5

5 4 3 2 1 0 -1 -2 -3 -4 -5

Neyman Construction of Frequentist Confidence Intervals

What if the model parameter is a quantity like ‘mass’ and your measurement is subject to a large statistical fluctuation?

You could either end up bounding a n u n p hy s i c a l reg ion (dodgy extrapolation) or have an empty interval!

What’s gone

wrong?

Mod

el P

aram

eter

of I

nter

est

(α)

Measurement (X)

statistical fluctuation

?unphysical

-5 -4 -3 -2 -1 0 1 2 3 4 5

5 4 3 2 1 0 -1 -2 -3 -4 -5

If everyone plays this game, then the true parameter value is correctly bounded with a frequency of CL

Neyman Construction of Frequentist Confidence Intervals

Nothing! Frequentists don’t care about you, only about the ensemble of many experiments

Mod

el P

aram

eter

of I

nter

est

(α)

Measurement (X)

unphysical

-5 -4 -3 -2 -1 0 1 2 3 4 5

5 4 3 2 1 0 -1 -2 -3 -4 -5

If everyone plays this game, then the true parameter value is correctly bounded with a frequency of CL

1-CL of the time you end up outside of these bounds

Neyman Construction of Frequentist Confidence Intervals

However, your own individual mea su remen t conta ins NO information on the validity of a given model !!!

Mod

el P

aram

eter

of I

nter

est

(α)

Measurement (X)

unphysical

Page 5: Lecture 5 - University of Oxfordbiller/Statistics_files/... · Lecture 5: • Bayesian & Frequentist Probabilities ... 1 0-1-2-3-4-5 If everyone plays this game, then the true parameter

Constructions for a Poisson distribution of unknown signal counts in the presence of an average of 3 background counts

90% CL for central interval 90% CL upper bound

All frequentist constructions are doomed in this regard!

What’s going on here? Empty intervals?

Quantisation m e a n s t h i s never exactly equals 90%

So must “overcover” (>90%)or else “undercover” (<90%)(choose former to be conservative)

Remember, frequentists don’t care about you!

Example: Find the standard frequentist 98% CL upper bound on the mean signal strength, S, for a counting experiment where the expected background level is B and a total of n events are observed.

Then just solve for Sup

nX

m=0

(Sup +B)me�(Sup+B)

m!= 1� 0.98

i.e. there is only a 2% chance of observing a number as small or smaller if S were any larger (“left end” of Neyman construction).

Note: for this case, the only difference from the Bayesian expression is the lack of normalisation on the allowed background range.

Frequent Statement About Frequentist Intervals

“There is a 68% chance (for a ±1σ CL interval) that the model parameter lies in this range.”

“There is a 68% chance that my interval happens to bound the one, true value of the model parameter.”

“If someone else were to repeat the experiment, there is a 68% chance that they would land in this range.”

No! There is not a probability distribution associated with the model parameter, that’s a Bayesian concept. Either it lies in your interval or not, but your one measurement does not constrain it.

No! This is just an attempt to say the same thing with a wording that sounds more frequentist. Either it lies in your interval or it doesn’t. However, there is a 68% probability that you would have been dealt a set of data that would have lead to an interval (not necessarily this particular one) containing the true parameter.

No! Your particular data set could have been a 3σ fluctuation, in which case there is very little chance that the next measurement would land in your interval.

Fre.quent.ist [free-kwuh nt-ist] noun

Qualifier: This is a generalisation and just a personal opinion.

But check it out - it’s really true!

One who espouses the principles of the frequency definition of probability, and then misapplies them to answer the Bayesian question that they actually have in mind.

Page 6: Lecture 5 - University of Oxfordbiller/Statistics_files/... · Lecture 5: • Bayesian & Frequentist Probabilities ... 1 0-1-2-3-4-5 If everyone plays this game, then the true parameter

The mathematical basis for the Neyman construction is sound. However, because physicists don’t like non-physical bounds etc., Feldman and Cousins proposed a Unified Approach to the Classical Statistical Analysis of Small Signals (Phys.Rev.D 57:3873-3889,1998) in order to address the following “problems”:

• Bounding of non-physical regions;• Empty intervals;• Poisson overcoverage; • Coverage Bias from experimenters choosing when to quote 2-sided versus 1-sided bounds (“Flip-flopping”)

‘non-physical bounds’ on what?

What the hell are you doing looking at model space?!

Prescription: The content of the Neyman interval (e.g. where you place the upper and lower boundaries) for an assumed true value of the model parameter, μ, is determined by ordering these according to the likelihood ratio: R =

P (x|µ)P (x|µbest)

where is the candidate measurement value to be added to the interval and is the model value that maximises the likelihood for in the physically allowed region.

x

µbestx

Note: this normalisation is an arbitrary choice and does not represent a comparison with the “most likely model.” For this interval construction, we assume μ is actually the correct model!

Why are you comparing to an inter va l intended for a d i f f e ren t assumption?

The first rule of ‘Fight Club’ is that we do not talk about ‘Fight Club’ !!

The range of model parameter values for which the likelihood of the measurement is closest to that of the model parameter for which the measurement is most likely.(can’t say “closest to most likely model” or “most likely ‘overall’ measurement”)

-5 -4 -3 -2 -1 0 1 2 3 4 5

5 4 3 2 1 0 -1 -2 -3 -4 -5

Feldman-Cousins Construction of Frequentist Confidence Intervals

Mod

el P

aram

eter

of I

nter

est

(α)

Measurement (X)

Assume that the measurement X is an unbiased est imator for t h e m o d e l parameter α

unphysical

For a given assumed model, intervals contain the top CL fraction of measurement values whose likelihood is closest to that of some model where the measurement is most likely.

F-C 90% CL interval for Poisson distribution of unknown signal counts in the presence of an average of 3 background counts

upper bound here

confidence interval here

• Bounding of non-physical regions;• Empty intervals;• Poisson overcoverage; • Coverage Bias from experimenters choosing when to quote 2-sided CLs (“Flip-flopping”)

How did we do?

Yes, but wasn’t really a problem for strict frequency interpretation and was done at the expense of an arbitrary weighting that is prone to confuse the interpretation.

If you’re worried about making a better guess at constraining a model based on your data, then you’re really asking a Bayesian question!

No, it’s been redistributed, but is still there (since it is inherent for Poisson).

Imagine we looked at 1000 astrophysical sources for signs of gamma ray emission and found one with an excess that corresponds to 3σ above background expectation. Given the number of trials (“look elsewhere effect”), we know that this is perfectly consistent with statistical fluctuations and the most relevant thing to quote for each source is a 90% CL upper bound on the gamma ray flux. However, the F-C prescription would instead force us to quote a 3σ discovery interval!

F-C table of

If you observe zero events, the confidence limit you quote on the signal still depends on the expected number of background events... even though you know for certain that the actual number of backgrounds is identically zero! (true for the standard Neyman construction as well)

“If everyone plays this game, then the true parameter value is correctly bounded with a frequency of 90%”

Frequentists don’t care about you !!!(or the horse you rode in on)

F-C table of 90% C.L. intervals for the Poisson signal mean μ, given total events observed n0 and with a known mean background b ranging from 0 to 5.

A Curious Property

Page 7: Lecture 5 - University of Oxfordbiller/Statistics_files/... · Lecture 5: • Bayesian & Frequentist Probabilities ... 1 0-1-2-3-4-5 If everyone plays this game, then the true parameter

from BAYES AND FREQUENTISM: A PARTICLE PHYSICIST’S PERSPECTIVE

Louis Lyons (http://arxiv.org/abs/1301.1273)

(but not likelihood of model!)

(hypothetical)

(frequency of occurrence for an infinite set of ‘identical’ hypothetical trials)

(but necessarily inexact for Poisson)

• Bayesian statistics is the only correct formalism that can address the question, “Given my measurement, what models do I constrain?” My experience is that this form of the question has been implicit in all discussions of the physical interpretation of experimental data I’ve seen.

• The standard frequentist approach is a perfectly valid and self-consistent formalism. However, it answers a different question, where the identification of a model only emerges for a “sufficiently large” ensemble of experiments. Unfortunately, this is often misinterpreted (or correctly interpreted but then misused).

• The Feldman-Cousins approach is a mathematically valid, if somewhat arbitrary, formulation of the frequentist method (though it may be even more prone to misinterpretation by not making the nature of these intervals appear quite so obvious).

Fortunately, for many cases (especially in the large n limit), these different approaches all give very similar results. However, this is not always the case, so be clear about exactly what your question you are asking!

Summary

Addendum

“What does ‘sufficiently large ensemble’ mean for the frequentist approach? Can’t I just chop up my data set into smaller pieces so as to create an ensemble of 1000 separate confidence bands?”

‘Sufficiently large’ means that you have to sample enough to get a good characterisation of the likelihood space of measurements for the model in question. Of course, if you do that, then the strength of the likelihood function in the Bayesian approach makes the form of the prior irrelevant, so both approaches are then just keying off of the same likelihood function and will thus arrive at identical bounds.

The deviation of the approaches in the regime of low statistics is a statement that there is not yet enough information to say much about the model based solely on the data without supplying at least some additional constraints. You then have one of two choices: either place the measurement in context for some hopeful ensemble of other experiments without trying to identify the model, or make an attempt to provide some minimal set of ‘reasonable and conservative’ constraints to infer which models are the most ‘likely.’