practical statistics - indico.cern.ch€¦ · practical statistics jochen ott june 5, 2014 jochen...
TRANSCRIPT
Practical Statistics
Jochen Ott
June 5, 2014
Jochen Ott TOTW Statistics Introduction and theta 1 / 52
Why “Practical Statistics”?
Statistics is often taught and discussed on simple cases such as a countingexperiment with a Poisson distribution and no systematics.
In practice, however, we deal with shape analyses and systematics –beyond what can easily extrapolated from a counting experiment.
Therefore, I’ll try to give an statistics introduction covering both: Thebasic statistical concepts, and at the same time show how to actuallyapply them in practice in a realistic analysis, as they are e.g. common inHiggs analyses at CMS.
My talk has three parts
1 Hypothesis Tests, Monte-Carlo Methods, Model Building
2 Limits and Intervals (Frequentist only)
3 Brief Introduction to statistical tools, esp. “theta”
Jochen Ott TOTW Statistics Introduction and theta 2 / 52
Part I
Hypothesis Tests
Jochen Ott TOTW Statistics Introduction and theta 3 / 52
1 Introduction; Counting Experiment
2 Generalization; Shape Model
3 Test Statistic; MC method
4 Handling of Systematic Uncertainties
5 Test Statistic Definition with Nuisances
Jochen Ott TOTW Statistics Introduction and theta 4 / 52
Introduction; Counting Experiment
Contents
1 Introduction; Counting Experiment
2 Generalization; Shape Model
3 Test Statistic; MC method
4 Handling of Systematic Uncertainties
5 Test Statistic Definition with Nuisances
Jochen Ott TOTW Statistics Introduction and theta 5 / 52
Introduction; Counting Experiment
Introduction
Statistical hypothesis testing is a formal method for decision making usingdata from a random process.
It is an attempt to disprove a null hypothesis H0, which is rejected if theprobability to observe the data that have actually been observed – or evenmore extreme data – is very low for H0.
This probability is called the p-value. If it is below some (small)pre-defined threshold α, the hypothesis H0 is rejected in favor of thealternative H1.
I’ll use two examples: A counting experiment and a (more realistic) shapeanalysis. Complications such as systematic uncertainties are added later.
Jochen Ott TOTW Statistics Introduction and theta 6 / 52
Introduction; Counting Experiment
Statistical Model
A statistical model specifies the probability to observe certain data d as afunction of the (real-valued) model parameters θ.
A simple statistical model is a counting experiment with knownbackground mean b = 5.2 and unknown signal s ≥ 0 we want to“discover”. The data comprises only the number of observed events n,which has a Poisson distribution around λ = b + s. b = 5.2 is constantand s ≥ 0 is the (only) model parameter.
This statistical model can be summarized as:
p(n|s) = Poisson(n|λ = b + s) =e−b−s(b + s)n
n!
Jochen Ott TOTW Statistics Introduction and theta 7 / 52
Introduction; Counting Experiment
Hypothesis Test
The null hypothesis – we would like to reject – is s = 0. The alternativehypothesis is a positive signal, s > 0.
Given the observed number of events nobs, the p-value is the probability toobserve as least as many events for the null hypothesis s = 0:
p(nobs) =∞∑
n=nobs
Poisson(n|λ = b = 5.2)
Remarks:
The p-value itself is a random variable.
Definition implies: p-value follows a uniform distribution on theinterval [0, 1] for H0 (or approximately if data is discrete)
Jochen Ott TOTW Statistics Introduction and theta 8 / 52
Introduction; Counting Experiment
Result
For the example of b = 5.2, the p-value is the probability to measure atleast as many events as observed for s = 0. This can be evaluated directlynumerically or by making toys.
0 2 4 6 8 10 12 14n
0.00
0.05
0.10
0.15
0.20p
λ = b = 5.2
nobs = 8
p = 0.155
nobs p
6 0.4198 0.15510 0.04012 0.0073
Jochen Ott TOTW Statistics Introduction and theta 9 / 52
Introduction; Counting Experiment
Possible Outcomes of a Hypothesis Test
There are two kinds of errors that can be made in the hypothesis test:
1 Rejecting H0 although it is true. This is the type-I error (or “error ofthe first kind”).
2 Not rejecting H0 although it is false; type-II error (or “error of thesecond kind”).
The first is usually regarded more severe. The probability for a type-I erroris the (“small”) threshold α used in the hypothesis test.
The type-II error is denoted β.The probability to (correctly) reject H0 if the alternative H1 is true is thepower (1− β).
Jochen Ott TOTW Statistics Introduction and theta 10 / 52
Introduction; Counting Experiment
Remarks
For a given fixed type-I error rate α, one would prefer the test withhigh power (1− β); this can be used to choose “optimal” teststatistic.
The p-value is not simply “the probability to observe the data asobserved”, but to “observe such data or more extreme data”. Thisdistinction is important; also, it implies that a comparison is neededbetween 2 datasets (real-valued test statistic).
The p-value is not the probability that the null hypothesis is true –such a statement carries no meaning in frequentist statistics, whereprobability always refers to (random) data and derived quantities,never to model parameters (but: Bayesian view differs, see below).
Rejecting the null hypothesis does not proof that the alternativehypothesis is true: Usually, many alternatives to the null hypothesisexist which are compatible with the observed observed data.
Jochen Ott TOTW Statistics Introduction and theta 11 / 52
Generalization; Shape Model
Contents
1 Introduction; Counting Experiment
2 Generalization; Shape Model
3 Test Statistic; MC method
4 Handling of Systematic Uncertainties
5 Test Statistic Definition with Nuisances
Jochen Ott TOTW Statistics Introduction and theta 12 / 52
Generalization; Shape Model
Introduction
So far: simplistic Poisson model with only one event count (“countingexperiment”).
Now: Generalize to a more realistic analysis in which the (binned) shape ofsome reconstructed mass distribution (Mrec) is analyzed to search for aresonance of unknown mass over some falling background.
Versions of such models are used in many channels of the Higgs bosonsearch at the LHC, but also many other searches.
Jochen Ott TOTW Statistics Introduction and theta 13 / 52
Generalization; Shape Model
Shape Model: Plot
For example, the expected Poisson means for background and data mightlook like this:
0 200 400 600 800 1000
Mrec
0
10
20
30
40
50
60
70
Eve
nts
per
bin
Data
Background
Signal, M = 500
Can this data be seen asevidence against thebackground-only model?
Can we exclude H0 =background only in favorof H1 = background +(scaled) M = 500signal?
Next steps: statisticalmodel, hypothesis test!
Jochen Ott TOTW Statistics Introduction and theta 14 / 52
Generalization; Shape Model
Shape Model: Plot
For example, the expected Poisson means for background and data mightlook like this:
0 200 400 600 800 1000
Mrec
0
10
20
30
40
50
60
70
Eve
nts
per
bin
Data
Background
Signal, M = 500
Can this data be seen asevidence against thebackground-only model?
Can we exclude H0 =background only in favorof H1 = background +(scaled) M = 500signal?
Next steps: statisticalmodel, hypothesis test!
Jochen Ott TOTW Statistics Introduction and theta 14 / 52
Generalization; Shape Model
Shape Model: Plot
For example, the expected Poisson means for background and data mightlook like this:
0 200 400 600 800 1000
Mrec
0
10
20
30
40
50
60
70
Eve
nts
per
bin
Data
Background
Signal, M = 500
Can this data be seen asevidence against thebackground-only model?
Can we exclude H0 =background only in favorof H1 = background +(scaled) M = 500signal?
Next steps: statisticalmodel, hypothesis test!
Jochen Ott TOTW Statistics Introduction and theta 14 / 52
Generalization; Shape Model
Shape Model: Plot
For example, the expected Poisson means for background and data mightlook like this:
0 200 400 600 800 1000
Mrec
0
10
20
30
40
50
60
70
Eve
nts
per
bin
Data
Background
Signal, M = 500
Can this data be seen asevidence against thebackground-only model?
Can we exclude H0 =background only in favorof H1 = background +(scaled) M = 500signal?
Next steps: statisticalmodel, hypothesis test!
Jochen Ott TOTW Statistics Introduction and theta 14 / 52
Generalization; Shape Model
Shape Model: Statistical Model
Probability to observe event counts ~n = (n1, n2, . . .) is a product ofPoisson probabilities in each bin:
p(~n|µ) =∏i
Poisson(ni |λi (µ)) with
λi (µ) = µsi + bi
where i = 1, . . . ,Nbins denotes the bin index. si and bi are the signal andbackground templates, resp., which are typically derived from Monte-Carlosimulation or from a background-enriched sideband.
µ ≥ 0 scales the signal template; it is the signal strength parameter. Apartfrom a (known constant) factor, it is the signal cross section.
Jochen Ott TOTW Statistics Introduction and theta 15 / 52
Test Statistic; MC method
Contents
1 Introduction; Counting Experiment
2 Generalization; Shape Model
3 Test Statistic; MC method
4 Handling of Systematic Uncertainties
5 Test Statistic Definition with Nuisances
Jochen Ott TOTW Statistics Introduction and theta 16 / 52
Test Statistic; MC method
The role of the Test Statistic t
Reminder: The p-value is defined as the probability to observe data atleast as extreme (signal-like) as the one actually observed.
For a counting experiment, more events are more “extreme”, more“signal-like”. In general, one has to summarize the “signal-likeness” in onereal number, this is the test statistic.
A possible choice is the (profile) likelihood ratio:
t = logmaxθ∈H1 L(θ|d)
maxθ∈H0 L(θ|d)
where we assume that the hypotheses H0 and H1 correspond to parametersub-spaces of a common stat. model; for searches: H0 is µ = 0 and H1 isµ > 0.
Again: t measures the compatibility with H0; large values meanincompatibility with H0, favoring H1.
Jochen Ott TOTW Statistics Introduction and theta 17 / 52
Test Statistic; MC method
The role of the Test Statistic t
Reminder: The p-value is defined as the probability to observe data atleast as extreme (signal-like) as the one actually observed.
For a counting experiment, more events are more “extreme”, more“signal-like”. In general, one has to summarize the “signal-likeness” in onereal number, this is the test statistic.
A possible choice is the (profile) likelihood ratio:
t = logmaxθ∈H1 L(θ|d)
maxθ∈H0 L(θ|d)
where we assume that the hypotheses H0 and H1 correspond to parametersub-spaces of a common stat. model; for searches: H0 is µ = 0 and H1 isµ > 0.
Again: t measures the compatibility with H0; large values meanincompatibility with H0, favoring H1.
Jochen Ott TOTW Statistics Introduction and theta 17 / 52
Test Statistic; MC method
The role of the Test Statistic t
Reminder: The p-value is defined as the probability to observe data atleast as extreme (signal-like) as the one actually observed.
For a counting experiment, more events are more “extreme”, more“signal-like”. In general, one has to summarize the “signal-likeness” in onereal number, this is the test statistic.
A possible choice is the (profile) likelihood ratio:
t = logmaxθ∈H1 L(θ|d)
maxθ∈H0 L(θ|d)
where we assume that the hypotheses H0 and H1 correspond to parametersub-spaces of a common stat. model; for searches: H0 is µ = 0 and H1 isµ > 0.
Again: t measures the compatibility with H0; large values meanincompatibility with H0, favoring H1.
Jochen Ott TOTW Statistics Introduction and theta 17 / 52
Test Statistic; MC method
p-value definition via t
The p-value is the probability to observe t ≥ tobs if H0 is true:
p = Pr(t ≥ tobs|H0).
This suggests using a Monte-Carlo method for calculating the p-value:
1 Generate a large number of toy data distributed according to H0.
2 For each toy data, calculate test statistic t.
3 For the observed data, calculate test statistic tobs.
4 The p-value is given by the fraction of toys with t ≥ tobs; if p < α,reject H0.
The values of tobs for which H0 is rejected at level α is known as criticalregion in t.
Jochen Ott TOTW Statistics Introduction and theta 18 / 52
Test Statistic; MC method
p-value definition via t
The p-value is the probability to observe t ≥ tobs if H0 is true:
p = Pr(t ≥ tobs|H0).
This suggests using a Monte-Carlo method for calculating the p-value:
1 Generate a large number of toy data distributed according to H0.
2 For each toy data, calculate test statistic t.
3 For the observed data, calculate test statistic tobs.
4 The p-value is given by the fraction of toys with t ≥ tobs; if p < α,reject H0.
The values of tobs for which H0 is rejected at level α is known as criticalregion in t.
Jochen Ott TOTW Statistics Introduction and theta 18 / 52
Test Statistic; MC method
MC method for p: Example II
Result of 100,000 background-only toys: p = 86/105 Z = 3.13
0 2 4 6 8 10
t
10−1
100
101
102
103
104
105N
toys
per
bin
tobs = 4.92
N(t ≥ tobs) = 86
Jochen Ott TOTW Statistics Introduction and theta 19 / 52
Handling of Systematic Uncertainties
Contents
1 Introduction; Counting Experiment
2 Generalization; Shape Model
3 Test Statistic; MC method
4 Handling of Systematic Uncertainties
5 Test Statistic Definition with Nuisances
Jochen Ott TOTW Statistics Introduction and theta 20 / 52
Handling of Systematic Uncertainties
Introduction
In the statistical model, each uncertainty is included as an additionalparameter, called nuisance parameter.
Often, there is some external knowledge about the possible values of thosenuisance parameters.
Introducing systematic uncertainties requires:
Changes of the statistical model, i.e. how the nuisance parameterstypically affect the probability.
Changes of the significance calculation, i.e. how to include knowledgeabout nuisance parameters in the inference.
Those items will be discussed separately in the following slides.
Jochen Ott TOTW Statistics Introduction and theta 21 / 52
Handling of Systematic Uncertainties
Example
In the counting experiment, assume that the expected background b = 5.2has some uncertainty (e.g. from limited statistics in a sideband). This canbe included by changing the statistical model to:
p(n|s, b) = Poisson(n|λ = b + s)
where b now is a nuisance parameter, not a constant.
We assume that there is external knowledge about b (e.g. from a sidebandmeasurement) suggesting that b is around b0 = 5.2 with some uncertainty∆b = 2.6.
Jochen Ott TOTW Statistics Introduction and theta 22 / 52
Handling of Systematic Uncertainties
Changes to Significance Evaluation
We assume there is external knowledge about the nuisance parameters,which has to be incorporated in the procedure. Possible methods include:
1 Make it internal to the model, i.e., fit the nuisance parametersimultaneously with the parameter of interest (e.g. include sidebandin likelihood model).
2 Use Bayesian priors for the nuisance parameters and take prior-average
3 Include auxiliary measurements in the statistical model in anapproximate way and use bootstrapping.
Here, only item 2. is covered (see Backup for 3.).
Jochen Ott TOTW Statistics Introduction and theta 23 / 52
Handling of Systematic Uncertainties
Changes to Significance Evaluation
We assume there is external knowledge about the nuisance parameters,which has to be incorporated in the procedure. Possible methods include:
1 Make it internal to the model, i.e., fit the nuisance parametersimultaneously with the parameter of interest (e.g. include sidebandin likelihood model).
2 Use Bayesian priors for the nuisance parameters and take prior-average
3 Include auxiliary measurements in the statistical model in anapproximate way and use bootstrapping.
Here, only item 2. is covered (see Backup for 3.).
Jochen Ott TOTW Statistics Introduction and theta 23 / 52
Handling of Systematic Uncertainties
Changes to Significance Evaluation
We assume there is external knowledge about the nuisance parameters,which has to be incorporated in the procedure. Possible methods include:
1 Make it internal to the model, i.e., fit the nuisance parametersimultaneously with the parameter of interest (e.g. include sidebandin likelihood model).
2 Use Bayesian priors for the nuisance parameters and take prior-average
3 Include auxiliary measurements in the statistical model in anapproximate way and use bootstrapping.
Here, only item 2. is covered (see Backup for 3.).
Jochen Ott TOTW Statistics Introduction and theta 23 / 52
Handling of Systematic Uncertainties
Bayesian vs. Frequentist “Probability”
Frequentist: “Probability is the relative frequency of a certainoutcome in an ensemble of (imaginary or real) repetitions of arandom process.”Probability is only assigned to “data” and derived quantities (teststatistic, p-value, etc.), but not to model parameters, which have onlyone true (unknown) value; the concept of probability does not apply.
Bayesian: “Probability can also be used to express the current stateof knowledge.”In particular, it is valid to talk about the probability that a modelparameter takes certain values.
Here, we discuss a “mixed”/“hybrid” method: Significance and p-valuesare a frequentist concept, but the use of nuisance parameter priors isBayesian.
Jochen Ott TOTW Statistics Introduction and theta 24 / 52
Handling of Systematic Uncertainties
Bayesian vs. Frequentist “Probability”
Frequentist: “Probability is the relative frequency of a certainoutcome in an ensemble of (imaginary or real) repetitions of arandom process.”Probability is only assigned to “data” and derived quantities (teststatistic, p-value, etc.), but not to model parameters, which have onlyone true (unknown) value; the concept of probability does not apply.
Bayesian: “Probability can also be used to express the current stateof knowledge.”In particular, it is valid to talk about the probability that a modelparameter takes certain values.
Here, we discuss a “mixed”/“hybrid” method: Significance and p-valuesare a frequentist concept, but the use of nuisance parameter priors isBayesian.
Jochen Ott TOTW Statistics Introduction and theta 24 / 52
Handling of Systematic Uncertainties
Bayesian vs. Frequentist “Probability”
Frequentist: “Probability is the relative frequency of a certainoutcome in an ensemble of (imaginary or real) repetitions of arandom process.”Probability is only assigned to “data” and derived quantities (teststatistic, p-value, etc.), but not to model parameters, which have onlyone true (unknown) value; the concept of probability does not apply.
Bayesian: “Probability can also be used to express the current stateof knowledge.”In particular, it is valid to talk about the probability that a modelparameter takes certain values.
Here, we discuss a “mixed”/“hybrid” method: Significance and p-valuesare a frequentist concept, but the use of nuisance parameter priors isBayesian.
Jochen Ott TOTW Statistics Introduction and theta 24 / 52
Handling of Systematic Uncertainties
Prior-Averaging
Modify the Monte-Carlo method for p-value calculation: To generate toydata,
1 Draw a random value for each nuisance parameter from its prior.
2 Draw random data from the probability according to the stat. modelevaluated for those (random) parameter values.
Then, as usual: p-value is the fraction of toys in which t ≥ tobs.
Formally, the resulting p-value is:
pa =
∫θ
P(t > tobs|θ)π(θ)dθ
where π(θ) is the prior for the nuisance parameters θ.
Jochen Ott TOTW Statistics Introduction and theta 25 / 52
Handling of Systematic Uncertainties
Example I: Counting Experiment
For a normal prior on b with mean b0 = 5.2 distribution for n changes:
0 2 4 6 8 10 12 14n
0.00
0.05
0.10
0.15
0.20
0.25p
b0 = 5.2
nobs = 8
p = 0.155
no uncertainty
In general: addingsystematic uncertainties“broadens” the teststatistic distribution,thus enlarging thep-value, reducing theZ -value.
Jochen Ott TOTW Statistics Introduction and theta 26 / 52
Handling of Systematic Uncertainties
Example I: Counting Experiment
For a normal prior on b with mean b0 = 5.2 distribution for n changes:
0 2 4 6 8 10 12 14n
0.00
0.05
0.10
0.15
0.20
0.25p
b0 = 5.2
nobs = 8
p = 0.162
no uncertainty
uncertainty: 10.0%
In general: addingsystematic uncertainties“broadens” the teststatistic distribution,thus enlarging thep-value, reducing theZ -value.
Jochen Ott TOTW Statistics Introduction and theta 26 / 52
Handling of Systematic Uncertainties
Example I: Counting Experiment
For a normal prior on b with mean b0 = 5.2 distribution for n changes:
0 2 4 6 8 10 12 14n
0.00
0.05
0.10
0.15
0.20
0.25p
b0 = 5.2
nobs = 8
p = 0.195
no uncertainty
uncertainty: 30.0%
In general: addingsystematic uncertainties“broadens” the teststatistic distribution,thus enlarging thep-value, reducing theZ -value.
Jochen Ott TOTW Statistics Introduction and theta 26 / 52
Test Statistic Definition with Nuisances
Contents
1 Introduction; Counting Experiment
2 Generalization; Shape Model
3 Test Statistic; MC method
4 Handling of Systematic Uncertainties
5 Test Statistic Definition with Nuisances
Jochen Ott TOTW Statistics Introduction and theta 27 / 52
Test Statistic Definition with Nuisances
Test Statistic
Test Statistic t defined as ratio of profile likelihoods for null andalternative:
t = logmaxθ∈H1 L(θ|d)
maxθ∈H0 L(θ|d)= log
maxµ≥0 L(µ|d)
L(µ = 0|d).
Now, model parameters θ include the parameter of interest (signalstrength µ) and nuisance parameters θn: θ = (µ, θn). Change t:
1 Fix nuisance parameters to most probable value θn,0 in maximization,i.e. only vary µ:
t ′ = logmaxµ≥0 L(µ, θn = 0|d)
L(µ = 0, θn = 0|d)
2 Replace L with the posterior, i.e. multiply by the nuisance parameterprior π:
t = logmaxµ≥0,θn L(µ, θn|d)× π(θn)
maxθn L(µ = 0, θn|d)× π(θn)
Jochen Ott TOTW Statistics Introduction and theta 28 / 52
Test Statistic Definition with Nuisances
Test Statistic
Test Statistic t defined as ratio of profile likelihoods for null andalternative:
t = logmaxθ∈H1 L(θ|d)
maxθ∈H0 L(θ|d)= log
maxµ≥0 L(µ|d)
L(µ = 0|d).
Now, model parameters θ include the parameter of interest (signalstrength µ) and nuisance parameters θn: θ = (µ, θn). Change t:
1 Fix nuisance parameters to most probable value θn,0 in maximization,i.e. only vary µ:
t ′ = logmaxµ≥0 L(µ, θn = 0|d)
L(µ = 0, θn = 0|d)
2 Replace L with the posterior, i.e. multiply by the nuisance parameterprior π:
t = logmaxµ≥0,θn L(µ, θn|d)× π(θn)
maxθn L(µ = 0, θn|d)× π(θn)
Jochen Ott TOTW Statistics Introduction and theta 28 / 52
Test Statistic Definition with Nuisances
Test Statistic
Test Statistic t defined as ratio of profile likelihoods for null andalternative:
t = logmaxθ∈H1 L(θ|d)
maxθ∈H0 L(θ|d)= log
maxµ≥0 L(µ|d)
L(µ = 0|d).
Now, model parameters θ include the parameter of interest (signalstrength µ) and nuisance parameters θn: θ = (µ, θn). Change t:
1 Fix nuisance parameters to most probable value θn,0 in maximization,i.e. only vary µ:
t ′ = logmaxµ≥0 L(µ, θn = 0|d)
L(µ = 0, θn = 0|d)
2 Replace L with the posterior, i.e. multiply by the nuisance parameterprior π:
t = logmaxµ≥0,θn L(µ, θn|d)× π(θn)
maxθn L(µ = 0, θn|d)× π(θn)
Jochen Ott TOTW Statistics Introduction and theta 28 / 52
Test Statistic Definition with Nuisances
Test Statistic and p-value
Both t ′ and t have been used in HEP analyses.
The definition of the test statistic is orthogonal to the definition of theensemble H0 used to define the p-value:For a MC method, this means it is crucial to vary the nuisance parametersin the toy data generation, while it’s not necessary to vary them in thedefinition of t.
Jochen Ott TOTW Statistics Introduction and theta 29 / 52
Test Statistic Definition with Nuisances
Applying uncertainties
Using the prior averaging method means that toy data is drawn forrandom values for θu test statistic distribution is broadened:
0 2 4 6 8 10
t
10−1
100
101
102
103
104
105
Nto
ys
per
bin
tobs = 4.92
N(t ≥ tobs) = 86 No uncertainties:p = 0.00086, Z = 3.1.
With 10% rateuncertainty: p = 0.096,Z = 1.3.
Jochen Ott TOTW Statistics Introduction and theta 30 / 52
Test Statistic Definition with Nuisances
Applying uncertainties
Using the prior averaging method means that toy data is drawn forrandom values for θu test statistic distribution is broadened:
0 2 4 6 8 10
t
10−1
100
101
102
103
104
105
Nto
ys
per
bin
tobs = 4.92
N(t ≥ tobs) = 9598 No uncertainties:p = 0.00086, Z = 3.1.
With 10% rateuncertainty: p = 0.096,Z = 1.3.
Jochen Ott TOTW Statistics Introduction and theta 30 / 52
Test Statistic Definition with Nuisances
Applying uncertainties
Using the prior averaging method means that toy data is drawn forrandom values for θu test statistic distribution is broadened:
0 2 4 6 8 10
t
10−1
100
101
102
103
104
105
Nto
ys
per
bin
tobs = 4.92
N(t ≥ tobs) = 5173 No uncertainties:p = 0.00086, Z = 3.1.
With 10% rateuncertainty: p = 0.096,Z = 1.3.
Jochen Ott TOTW Statistics Introduction and theta 30 / 52
Part II
Confidence Intervals
Jochen Ott TOTW Statistics Introduction and theta 31 / 52
Introduction and Definitions
Confidence intervals are probabilistic statement about the value ofparameters of a statistical model. They are calculated at a givenconfidence level which specifies the (claimed/desired) coverage.
The coverage is the probability that the interval contains the true value. Ingeneral, the coverage is a function of the true parameter values. (Notethat this does not assign probabilities to the true value but only to theinterval construction!)
A method is said to over-cover (and conservative) if the coverage is abovethe confidence level; the opposite is under-coverage. Sometimes, exactcoverage cannot be reached (e.g. due to discrete data); in this case oneusually chooses to construct the method to over-cover.
Jochen Ott TOTW Statistics Introduction and theta 32 / 52
Introduction and Definitions
Confidence intervals are probabilistic statement about the value ofparameters of a statistical model. They are calculated at a givenconfidence level which specifies the (claimed/desired) coverage.
The coverage is the probability that the interval contains the true value. Ingeneral, the coverage is a function of the true parameter values. (Notethat this does not assign probabilities to the true value but only to theinterval construction!)
A method is said to over-cover (and conservative) if the coverage is abovethe confidence level; the opposite is under-coverage. Sometimes, exactcoverage cannot be reached (e.g. due to discrete data); in this case oneusually chooses to construct the method to over-cover.
Jochen Ott TOTW Statistics Introduction and theta 32 / 52
Counting Experiment
Consider the simple counting model with b = 5.2 and unknown s ≥ 0 with
p(n|s) = Poisson(n|s + b).
For a given observation (e.g. nobs = 5), what statement can be madeabout s?For example, we would expect to rule out large values for s, e.g. s = 100.
This is a question for a hypothesis test.
Jochen Ott TOTW Statistics Introduction and theta 33 / 52
Counting Experiment
Consider the simple counting model with b = 5.2 and unknown s ≥ 0 with
p(n|s) = Poisson(n|s + b).
For a given observation (e.g. nobs = 5), what statement can be madeabout s?For example, we would expect to rule out large values for s, e.g. s = 100.
This is a question for a hypothesis test.
Jochen Ott TOTW Statistics Introduction and theta 33 / 52
Intervals and Hypothesis Tests
Upper limit construction by hypothesis test “inversion”:
1 For a given s = s0, make a hypothesis test with the null hypothesiss = s0 and the alternative s < s0 with type-I error α (e.g., α = 0.05).
2 Repeat step 1 for different values of s0.
3 The confidence interval for s comprises exactly those values s0 forwhich the hypothesis test could not reject the null hypothesis s = s0.
For this formulation of the hypothesis test (s = s0 vs. s < s0), we get anupper limit.
The confidence level is (1− α) (here: 95%).
This is known as the Neyman Construction. It can be visualized as “beltconstruction” on the (n–s) plane.
Jochen Ott TOTW Statistics Introduction and theta 34 / 52
Belt
The Neyman construction as “belt” in the s–nobs plane: For each s, the“belt” in nobs has probability ≥ 1− α. For given nobs, the interval is givenby intersecting the horizontal line with the belt.
0 2 4 6 8 10 12 14s
0
5
10
15
20
25
30
nobs
p = α
Jochen Ott TOTW Statistics Introduction and theta 35 / 52
Belt
The Neyman construction as “belt” in the s–nobs plane: For each s, the“belt” in nobs has probability ≥ 1− α. For given nobs, the interval is givenby intersecting the horizontal line with the belt.
0 2 4 6 8 10 12 14s
0
5
10
15
20
nobs
nobs = 8
Jochen Ott TOTW Statistics Introduction and theta 35 / 52
Comment
The close relationship with hypothesis testing means many aspects fromHT also apply to limits:
explicitly introduce test statistic for shape models; again: Teststatistic choice not unique, can use profile likelihood ratio t
use toy Monte-Carlo to get test statistic distribution for H0
handling of systematic uncertainties via Bayesian/hybrid method
use of asymptotic methods
concept of “expected limit” by running the method (imaginatively orwith MC) on an ensemble representing the expected data (usuallybackground-only)
. . .
Jochen Ott TOTW Statistics Introduction and theta 36 / 52
Extensions
So far, discussed Neyman construction for limits. Modifications:
Modify “belt” construction: for each s, include the “central” set ofnobs into belt “central intervals”, which are two-sided
Use likelihood ratio as ordering criterion to decide what to include inthe belt Feldman-Cousins “unified intervals” with smoothtransition between limit-like and central-like intervals
Purely frequentist methods might lead to empty intervals modifyfrequentist p-values by a “penalty factor” which is high ifexperimental sensitivity is low modified frequentist CLs limits
Jochen Ott TOTW Statistics Introduction and theta 37 / 52
Part III
theta
Jochen Ott TOTW Statistics Introduction and theta 38 / 52
6 Statistical Model
7 Using theta
Jochen Ott TOTW Statistics Introduction and theta 39 / 52
Statistical Model
Contents
6 Statistical Model
7 Using theta
Jochen Ott TOTW Statistics Introduction and theta 40 / 52
Statistical Model
Overview
theta is a program for statistical inference. It only handles binned modelsthat use histograms for the prediction such as the example:
0 200 400 600 800 1000
Mrec
0
10
20
30
40
50
60
70
Eve
nts
per
bin
Data
Background
Signal, M = 500Important limits: onebin = countingexperiment; manynarrow bins almostequivalent to unbinnedcase
Jochen Ott TOTW Statistics Introduction and theta 41 / 52
Statistical Model
Formal Shape Model
The statistical model is the product of Poisson in each bin:
p(n|θ) =∏i
Poisson(ni |λi (θ))
where i is the bin index and the expected number of events in bin i , λi , isgiven by the sum of (scaled) signal and background histograms:
λi (θ) = µsi +∑p
cp(θn)bpi (θn).
p denotes the different background processes which are expected tocontribute. The bin-independent coefficient cp(θn) encodes(process-specific) rate uncertainties, while bpi (θn) is the most generaldependence, a shape uncertainty.
Jochen Ott TOTW Statistics Introduction and theta 42 / 52
Statistical Model
Rate Uncertainties: Example
In the shape example model, we might estimate a 10% uncertainty on theoverall rate of the background:
0 200 400 600 800 1000
Mrec
0
10
20
30
40
50
60
70
Eve
nts
per
bin
Data
Background
Signal, M = 500
Have “plus” and“minus” templates byscaling the “nominaltemplate up and downby 10%. introduce nuisanceparameter which scalesthe template accordingly(see backup forimplementation).
Jochen Ott TOTW Statistics Introduction and theta 43 / 52
Statistical Model
Rate Uncertainties: Example
In the shape example model, we might estimate a 10% uncertainty on theoverall rate of the background:
0 200 400 600 800 1000
Mrec
10
20
30
40
50
60
70
Eve
nts
per
bin
nominal
minus
plus
Have “plus” and“minus” templates byscaling the “nominaltemplate up and downby 10%. introduce nuisanceparameter which scalesthe template accordingly(see backup forimplementation).
Jochen Ott TOTW Statistics Introduction and theta 43 / 52
Statistical Model
Rate Uncertainties: Example
In the shape example model, we might estimate a 10% uncertainty on theoverall rate of the background:
0 200 400 600 800 1000
Mrec
10
20
30
40
50
60
70
Eve
nts
per
bin
nominal
minus
plus
Have “plus” and“minus” templates byscaling the “nominaltemplate up and downby 10%. introduce nuisanceparameter which scalesthe template accordingly(see backup forimplementation).
Jochen Ott TOTW Statistics Introduction and theta 43 / 52
Statistical Model
Shape Uncertainty: Example
In the shape model, assume you have an uncertainty affecting thebackground shape like this:
0 200 400 600 800 1000
Mrec
10
20
30
40
50
60
70
80
Eve
nts
per
bin
nominal
minus
plus
Can introduce thenuisance parameter anduse it in the model tointerpolate smoothlybetween the threetemplates, if assumingthe “plus” and “minus”correspond to a ±1σuncertainty (see backupfor details)
Jochen Ott TOTW Statistics Introduction and theta 44 / 52
Using theta
Contents
6 Statistical Model
7 Using theta
Jochen Ott TOTW Statistics Introduction and theta 45 / 52
Using theta
theta Overview
theta is a software package for making statistical inference usingtemplate-based models.
The “core” of theta is a (heavily optimized) C++ program, controlledby a text-based cfg-file, writing the result to a sqlite database or aroot tree.
A python interface “theta-auto” can be used for many cases whichmakes usage easier than writing the cfg-file manually.
Here: concentrate on the python interface.Usually, the script has two separate steps: Building the statistical model(as class Model), and then applying different statistical methods.
Jochen Ott TOTW Statistics Introduction and theta 46 / 52
Using theta
Building Models I
Formally, the general statistical model in theta is
p(n|θ) =∏i
Poisson(ni |λi (θ))
where i is the bin index and the mean expected number of events in bin i ,λi , is given by the sum of (scaled) signal and background histograms:
λi (θ) = µsi (θ) +∑p
cp(θn)bpi (θn).
where p runs over the background processes.
The python interface supports log-normal coefficients for cp and thetemplate morphing for bpi (θn), as discussed earlier. theta also supportshandling MC statistical uncertainties with the “Barlow-Beeston light”method.
Jochen Ott TOTW Statistics Introduction and theta 47 / 52
Using theta
Building Models II
In theta-auto, models can be built e.g. from a text-based “datacard” (asused for “combine”), which specifies the model and the observed data:
imax 1 number of channelsjmax 3 number of backgroundskmax 2 number of nuisance parametersbin 1observation 0bin 1 1 1 1process ggH qqWW ggWW othersprocess 0 1 2 3rate 1.47 0.63 0.06 0.22lumi lnN 1.11 − 1.11 −
xs ggWW lnN − − 1.50 −
Note that it is possible to include template morphing by refering tohistograms in a root file. See SWGuideHiggsAnalysisCombinedLimit fordetails (CMS-internal).
Jochen Ott TOTW Statistics Introduction and theta 48 / 52
Using theta
Method Overview
Once the statistical model is built, one can use it to
perform a maximum likelihood fit to get estimates for all modelparameters
compute profile likelihood intervals
compute toy-based or asymptotic CLs limits (observed and expected)
evaluate the posterior using Markov-Chain Monte-Carlo (Bayesianmethod for intervals and limits)
. . .
Jochen Ott TOTW Statistics Introduction and theta 49 / 52
Using theta
Example: Maximum Likelihood Estimate
Most methods are implemented as a single python function, taking themodel, the “input” dataset and the number of evaluations as input. Tomake the fit on data, use
res = mle(model, ’data’ , 1)print res
To make the fit on 1000 toys with µ = 1.2, use:
res = mle(model, ’toys :1.2 ’ , 1000)# do some calculations with res
where the “calculations” could e.g. study the bias and pull of themaximum likelihood estimate.
More information on theta: http://theta-framework.org/.
Jochen Ott TOTW Statistics Introduction and theta 50 / 52
Using theta
Comments on Maximum Likelihood
The Maximum likelihood estimate has some nice asymptotic properties:
The bias goes to zero
Among the unbiased estimators, it has the smallest variance
Can use curvature of negative log-likelihood to estimate variance
Maximum Likelihood is the “default choice” for an estimator in HEP.
Jochen Ott TOTW Statistics Introduction and theta 51 / 52
Using theta
Summary
Introduced concepts in one sentence:
Hypothesis tests: The p-value is the probability to observe at least as“extreme” data.
Test Statistic is the measure of “extreme”.
Neyman Interval construction as an “inversion” of the hypothesistest: Include those points µ0 in the interval for which we can notreject the null hypothesis µ = µ0.
Handle rate (shape) uncertainties in the statistical model byintroducing a nuisance parameter scaling (interpolating) the template
Jochen Ott TOTW Statistics Introduction and theta 52 / 52
Part IV
Backup
Jochen Ott TOTW Statistics Introduction and theta 53
8 Misc
9 Frequentist Interpretation; Bootstrapping
10 Rate Uncertainty Implementation
11 Shape Uncertainties
12 BB light method
Jochen Ott TOTW Statistics Introduction and theta 54
Misc
Contents
8 Misc
9 Frequentist Interpretation; Bootstrapping
10 Rate Uncertainty Implementation
11 Shape Uncertainties
12 BB light method
Jochen Ott TOTW Statistics Introduction and theta 55
Misc
p-value Distribution for Discrete Data
The p-value is defined to observe at least as extreme data for H0.
If the data in the statistical model is discrete, the p-value can’t follow aproper uniform distribution on [0, 1].
In general (also for discrete data):
Pr(p ≤ p0|H0) ≤ p0 for 0 ≤ p0 ≤ 1.
In words: The probability to observe a p-value below some threshold p0 isat most p0 (and if equality holds, p-value is indeed uniform on [0, 1]).
Jochen Ott TOTW Statistics Introduction and theta 56
Misc
“Expected” vs. “Observed” Result
Consider two counting experiments A, B searching for the same signal,both expecting background b = 100, and signals sA = 20 and sB = 15,clearly indicating a better performance for A.
By chance, nA = 120, sB = 120, giving ZA ≈ 1 and ZB ≈ 1.3, so using the“observed” significance, experiment B is “better”, which is of coursenonsense.
Related issue in the statement: “Experiment A sees 3σ effect, experimentB sees 3.5σ effect, so in summary we have a 3.5σ effect” (or similarstatement for limits).However, this is wrong from the statistical point of view, as minimum oftwo p-values is not a proper p-value (refer to look-elsewhere effect). solution: Always use expected significance and decide whichanalysis/experiment to use without using the (random) data result.
Jochen Ott TOTW Statistics Introduction and theta 57
Misc
“Expected” vs. “Observed” Result
Consider two counting experiments A, B searching for the same signal,both expecting background b = 100, and signals sA = 20 and sB = 15,clearly indicating a better performance for A.
By chance, nA = 120, sB = 120, giving ZA ≈ 1 and ZB ≈ 1.3, so using the“observed” significance, experiment B is “better”, which is of coursenonsense.
Related issue in the statement: “Experiment A sees 3σ effect, experimentB sees 3.5σ effect, so in summary we have a 3.5σ effect” (or similarstatement for limits).However, this is wrong from the statistical point of view, as minimum oftwo p-values is not a proper p-value (refer to look-elsewhere effect). solution: Always use expected significance and decide whichanalysis/experiment to use without using the (random) data result.
Jochen Ott TOTW Statistics Introduction and theta 57
Frequentist Interpretation; Bootstrapping
Contents
8 Misc
9 Frequentist Interpretation; Bootstrapping
10 Rate Uncertainty Implementation
11 Shape Uncertainties
12 BB light method
Jochen Ott TOTW Statistics Introduction and theta 58
Frequentist Interpretation; Bootstrapping
Model Reminder
The observed data can be summarized as the number of observed eventsn. The probability to observe n events is given by a Poisson probability:
p(n|θ) = Poisson(n|λ(θ)).
The Poisson mean λ(θ) is given by the sum of (scaled) signal andbackground yields,
λi (θ) = µs + b(θn),
where the model parameters θ comprise the signal strength parameter µand the nuisance parameters θn: θ = (µ, θn).
External knowledge about the nuisance parameters is encoded in the priorπ(θn).
Jochen Ott TOTW Statistics Introduction and theta 59
Frequentist Interpretation; Bootstrapping
Frequentist Interpretation
The Bayesian posterior f is given by the likelihood times the prior(assumed to be normal here):
f (θ) = L(θ|d)×N (θn),
(apart from an unimportant normalization). f (θ) can be used in place ofplain L at many places (e.g. parameter estimation, definition of t).
Frequentist re-interpretation:
f (θ) ∝ L(θ|d)×∏u
e−
(θu−µu )2
2δ2u
where µu = 0 and δu = 1, u runs over all nuisance parameters.This can be interpreted as the likelihood function of a slightly differentmodel by swapping θu and µu. The µu now are random variables, part ofthe data. The data comprise the number of observed events and thevalues for µu (with µu = 0 for the observed data).
Jochen Ott TOTW Statistics Introduction and theta 60
Frequentist Interpretation; Bootstrapping
Frequentist Interpretation
The Bayesian posterior f is given by the likelihood times the prior(assumed to be normal here):
f (θ) = L(θ|d)×N (θn),
(apart from an unimportant normalization). f (θ) can be used in place ofplain L at many places (e.g. parameter estimation, definition of t).
Frequentist re-interpretation:
f (θ) ∝ L(θ|d)×∏u
e−
(θu−µu )2
2δ2u
where µu = 0 and δu = 1, u runs over all nuisance parameters.This can be interpreted as the likelihood function of a slightly differentmodel by swapping θu and µu. The µu now are random variables, part ofthe data. The data comprise the number of observed events and thevalues for µu (with µu = 0 for the observed data).
Jochen Ott TOTW Statistics Introduction and theta 60
Frequentist Interpretation; Bootstrapping
Frequentist Interpretation: Comments
No longer need (Bayesian) concept of prior for model parameters θu;instead, have extended the data by µu.
Allows to use purely frequentist concepts for defining ensembles oftoys data; but: requires to choose parameter values.
Choose parameter values by fitting to data: “(parametric)bootstrapping”.
If want to keep structure for f , have to use conjugate distribution forµu in the frequentist model. Normal distribution is self-conjugate use normal model for distribution of µu.
Jochen Ott TOTW Statistics Introduction and theta 61
Frequentist Interpretation; Bootstrapping
Updated Monte-Carlo Method for p-value
For the p-value calculation with Monte-Carlo, the steps are modified:
1 Make a maximum likelihood fit to data (with null hypothesis H0) toget estimates for nuisance parameters θu, θu.
2 Generate toy data by sampling from the model at the fitted values forθu; in particular, draw a Gaussian for µu around θu with width 1.
For each toy data, calculate the test statistic value, e.g. using the t ′ or tdefinitions. The fraction of toys for which t ≥ tobs is the p-value.
Jochen Ott TOTW Statistics Introduction and theta 62
Frequentist Interpretation; Bootstrapping
Summary; Comments
The expression for the posterior can be interpreted purely frequentist wayof a slightly different statistical model with an extended dataset. For thatmodel, can apply parametric bootstrapping and proceed with a purelyfrequentist framework.
Notes:
The frequentist approach allows the application of asymptoticformulas
This is the method used in the LHC Higgs combination.
Jochen Ott TOTW Statistics Introduction and theta 63
Rate Uncertainty Implementation
Contents
8 Misc
9 Frequentist Interpretation; Bootstrapping
10 Rate Uncertainty Implementation
11 Shape Uncertainties
12 BB light method
Jochen Ott TOTW Statistics Introduction and theta 64
Rate Uncertainty Implementation
Rate Uncertainties: Implementation
In the statistical model, had expression for Poisson mean
λi (θ) = µsi +∑p
cp(θn)bpi (θn).
Rate uncertainties can be included in the coefficient cp, e.g. by using
cp(θu) = θu
where θu has a log-normal prior around 1 (see Backup.). Equivalently, wecan also use:
cp(θu) = eθu∆b
where the nuisance parameter θu has a normal prior around 0 with standarddeviation 1, which corresponds to a scale factor with a log-normal prior.
Jochen Ott TOTW Statistics Introduction and theta 65
Rate Uncertainty Implementation
Rate Uncertainties: Implementation
In the statistical model, had expression for Poisson mean
λi (θ) = µsi +∑p
cp(θn)bpi (θn).
Rate uncertainties can be included in the coefficient cp, e.g. by using
cp(θu) = θu
where θu has a log-normal prior around 1 (see Backup.). Equivalently, wecan also use:
cp(θu) = eθu∆b
where the nuisance parameter θu has a normal prior around 0 with standarddeviation 1, which corresponds to a scale factor with a log-normal prior.
Jochen Ott TOTW Statistics Introduction and theta 65
Rate Uncertainty Implementation
Equivalent Parametrizations
We just saw two different, but equivalent methods to introduce alog-normal scale factor.
This is an example of a more general principle:The statistical model can be re-parametrized, which changes both the“model response” to the nuisance parameter and the parameter prior.
Using this freedom, one can use independent standard normal priors forthe nuisance parameters, which will be assumed from now on.
Jochen Ott TOTW Statistics Introduction and theta 66
Rate Uncertainty Implementation
Rate Uncertainties: Normal vs. log-normal I/II
The uncertainty on b was implemented by using the stat. model
p(n|s, b) = Poisson(λ = s + b)
with a normal prior for b around known b0 with known width ∆b.
But: λ can become negative with non-zero probability, for which a Poissonis not defined.
Instead, use a log-normal prior for a scale factor for b0:
λ(s, f ) = s + f · b0
where f has a log-normal prior, i.e., log f has a normal distribution.An equivalent formulation is
λ(s, θ) = s + eθ log(1+∆b)b0
where the n.p. θ has a normal prior with mean 0 and standard deviation 1.Jochen Ott TOTW Statistics Introduction and theta 67
Rate Uncertainty Implementation
Rate Uncertainties: Normal vs. log-normal II/II
Comparing the prior for the scale factor between normal and log-normal:∆b = 0.1:
0.0 0.5 1.0 1.5 2.0 2.5
f
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5p
normal
log-normal
Log-normal avoidsunphysical jump /truncation at f = 0,while normal priorrequires that for large∆b.
Jochen Ott TOTW Statistics Introduction and theta 68
Rate Uncertainty Implementation
Rate Uncertainties: Normal vs. log-normal II/II
Comparing the prior for the scale factor between normal and log-normal:∆b = 0.3:
0.0 0.5 1.0 1.5 2.0 2.5
f
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6p
normal
log-normal
Log-normal avoidsunphysical jump /truncation at f = 0,while normal priorrequires that for large∆b.
Jochen Ott TOTW Statistics Introduction and theta 68
Rate Uncertainty Implementation
Rate Uncertainties: Normal vs. log-normal II/II
Comparing the prior for the scale factor between normal and log-normal:∆b = 1.0:
0.0 0.5 1.0 1.5 2.0 2.5
f
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8p
normal
log-normal
Log-normal avoidsunphysical jump /truncation at f = 0,while normal priorrequires that for large∆b.
Jochen Ott TOTW Statistics Introduction and theta 68
Shape Uncertainties
Contents
8 Misc
9 Frequentist Interpretation; Bootstrapping
10 Rate Uncertainty Implementation
11 Shape Uncertainties
12 BB light method
Jochen Ott TOTW Statistics Introduction and theta 69
Shape Uncertainties
Shape Uncertainties: Introduction
Can have uncertainties also affecting shape in a general way (e.g. byenergy calibration, . . . ).
Typically, in an analysis, one would
Use MC sample (or sideband) to get a shape for a process “nominaltemplate”
Modify the MC (or sideband) to get “±1σ effects” of someuncertainty, e.g. by re-weighting events, modifying the energycalibration, using a different sideband, . . . “plus”/“minus” template
Jochen Ott TOTW Statistics Introduction and theta 70
Shape Uncertainties
Shape Uncertainties: Statistical Model
Follow general recipe: introduce nuisance parameter θu with standardnormal prior, and write the model prediction for the Poisson mean λi as afunction of the new parameter.
It should interpolate smoothly between the “minus” template for θu = −1,the “nominal” template at θu = 0 and the “plus” template at θu = +1.
There are many possibilities to achieve this; here: Use cubic interpolationfor |θu| < 1 and linear extrapolation for |θu| > 1.
Jochen Ott TOTW Statistics Introduction and theta 71
Shape Uncertainties
Shape Uncertainties: Single Bin Behavior
Example interpolation for the bin around Mrec = 835:
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0
θu
18
19
20
21
22
23
24
bpi(θ
u)
bipu+ = 21.9
bipu− = 19.3
bip0 = 20.2
bipu± are the bincontents for the “plus”and “minus” templates;bip0 for the “nominal”.
Jochen Ott TOTW Statistics Introduction and theta 72
Shape Uncertainties
Shape Uncertainties: Shape Behavior
Applying template morphing for certain values θu, the backgroundtemplate looks like this:
0 200 400 600 800 1000
Mrec
10
20
30
40
50
60
70
80
90
Eve
nts
per
bin
nominal θu = 0.0
minus θu = −1.0
plus θu = 1.0
Interpolation agrees withintuitive expectation.
Jochen Ott TOTW Statistics Introduction and theta 73
Shape Uncertainties
Shape Uncertainties: Shape Behavior
Applying template morphing for certain values θu, the backgroundtemplate looks like this:
0 200 400 600 800 1000
Mrec
10
20
30
40
50
60
70
80
90
Eve
nts
per
bin
nominal
θu = −0.5
θu = 2.0
minus
plusInterpolation agrees withintuitive expectation.
Jochen Ott TOTW Statistics Introduction and theta 73
BB light method
Contents
8 Misc
9 Frequentist Interpretation; Bootstrapping
10 Rate Uncertainty Implementation
11 Shape Uncertainties
12 BB light method
Jochen Ott TOTW Statistics Introduction and theta 74
BB light method
Introduction
If searching for a small signal using Monte-Carlo (or the data in asideband) to define the statistical model, the limited size of theMonte-Carlo sample might introduce an additional uncertainty on thepredicted mean λci .
Example: counting experiment in which only NMC = 1 background eventsurvives, weighted to b = 0.2, and n = 3 events are observed. If b = 0.2was known exactly, this would correspond to a 3σ effect, but it is possiblethat the true mean corresponds to NMC = 2 (b = 0.4), which correspondsto a much smaller significance . . .
Jochen Ott TOTW Statistics Introduction and theta 75
BB light method
Barlow-Beeston Method
Idea of Barlow and Beeston (Comp. Phys. Comm. 77 (1993)):
Introduce the true mean in bin i as nuisance parameter ǫi in thestatistical model
Extend the statistical model to model the joint probability to observeni data events and NMC,i Monte-Carlo events, given ǫi (both arebasically Poisson)
For maximum likelihood methods, note that maximizing w.r.t. ǫileads to de-coupled equations that can be solved numerically
This is implemented in ROOT’s TFractionFitter.
Disadvantage: Need to numerically solve equations at each step in theminimization.
Jochen Ott TOTW Statistics Introduction and theta 76
BB light method
Modification: Barlow-Beeston light
Same idea: add one nuisance parameter δci per bin to model MC stat.uncertainty:
µci (θ, δ) = µci (θ) + δci
where δci here has a normal prior (with mean 0).
This can be implemented using template morphing: Add Nbins nuisanceparameters δci , each shifting exactly one bin w.r.t. the nominal template.But: can lead to hundreds of nuisance parameters numericalminimization slow, unstable.
Jochen Ott TOTW Statistics Introduction and theta 77
BB light method
Analytical Solution
To maximize the Poisson likelihood function L(θ, δ|n), set all derivatives tozero and solve for the parameters δci analytical solution δm(θ)!
Then, the actual maximization algorithm is run on the likelihood functionLm in which the δ parameters have been “maximized away”:
Lm(θ|d) := L(θ, δm(θ)|d)
Allows to handle MC stat. uncertainties without introducing anadditional “visible” nuisance parameter.
Jochen Ott TOTW Statistics Introduction and theta 78
BB light method
Comments
BB light uses normal approximation need enough MC events in eachbin.
Possible work-arounds in case of low MC statistics include:
re-bin to reach a minimum number of MC events
use a non-parametric smoothing procedure (neighbor bins)
fit a function (“parametric smoothing”)
Often, first option is the easiest one, but: could loose sensitivity w.r.t.others.
Jochen Ott TOTW Statistics Introduction and theta 79