practical statistics - indico.cern.ch€¦ · practical statistics jochen ott june 5, 2014 jochen...

106
Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Upload: others

Post on 18-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Practical Statistics

Jochen Ott

June 5, 2014

Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Page 2: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Why “Practical Statistics”?

Statistics is often taught and discussed on simple cases such as a countingexperiment with a Poisson distribution and no systematics.

In practice, however, we deal with shape analyses and systematics –beyond what can easily extrapolated from a counting experiment.

Therefore, I’ll try to give an statistics introduction covering both: Thebasic statistical concepts, and at the same time show how to actuallyapply them in practice in a realistic analysis, as they are e.g. common inHiggs analyses at CMS.

My talk has three parts

1 Hypothesis Tests, Monte-Carlo Methods, Model Building

2 Limits and Intervals (Frequentist only)

3 Brief Introduction to statistical tools, esp. “theta”

Jochen Ott TOTW Statistics Introduction and theta 2 / 52

Page 3: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Part I

Hypothesis Tests

Jochen Ott TOTW Statistics Introduction and theta 3 / 52

Page 4: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

1 Introduction; Counting Experiment

2 Generalization; Shape Model

3 Test Statistic; MC method

4 Handling of Systematic Uncertainties

5 Test Statistic Definition with Nuisances

Jochen Ott TOTW Statistics Introduction and theta 4 / 52

Page 5: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Introduction; Counting Experiment

Contents

1 Introduction; Counting Experiment

2 Generalization; Shape Model

3 Test Statistic; MC method

4 Handling of Systematic Uncertainties

5 Test Statistic Definition with Nuisances

Jochen Ott TOTW Statistics Introduction and theta 5 / 52

Page 6: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Introduction; Counting Experiment

Introduction

Statistical hypothesis testing is a formal method for decision making usingdata from a random process.

It is an attempt to disprove a null hypothesis H0, which is rejected if theprobability to observe the data that have actually been observed – or evenmore extreme data – is very low for H0.

This probability is called the p-value. If it is below some (small)pre-defined threshold α, the hypothesis H0 is rejected in favor of thealternative H1.

I’ll use two examples: A counting experiment and a (more realistic) shapeanalysis. Complications such as systematic uncertainties are added later.

Jochen Ott TOTW Statistics Introduction and theta 6 / 52

Page 7: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Introduction; Counting Experiment

Statistical Model

A statistical model specifies the probability to observe certain data d as afunction of the (real-valued) model parameters θ.

A simple statistical model is a counting experiment with knownbackground mean b = 5.2 and unknown signal s ≥ 0 we want to“discover”. The data comprises only the number of observed events n,which has a Poisson distribution around λ = b + s. b = 5.2 is constantand s ≥ 0 is the (only) model parameter.

This statistical model can be summarized as:

p(n|s) = Poisson(n|λ = b + s) =e−b−s(b + s)n

n!

Jochen Ott TOTW Statistics Introduction and theta 7 / 52

Page 8: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Introduction; Counting Experiment

Hypothesis Test

The null hypothesis – we would like to reject – is s = 0. The alternativehypothesis is a positive signal, s > 0.

Given the observed number of events nobs, the p-value is the probability toobserve as least as many events for the null hypothesis s = 0:

p(nobs) =∞∑

n=nobs

Poisson(n|λ = b = 5.2)

Remarks:

The p-value itself is a random variable.

Definition implies: p-value follows a uniform distribution on theinterval [0, 1] for H0 (or approximately if data is discrete)

Jochen Ott TOTW Statistics Introduction and theta 8 / 52

Page 9: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Introduction; Counting Experiment

Result

For the example of b = 5.2, the p-value is the probability to measure atleast as many events as observed for s = 0. This can be evaluated directlynumerically or by making toys.

0 2 4 6 8 10 12 14n

0.00

0.05

0.10

0.15

0.20p

λ = b = 5.2

nobs = 8

p = 0.155

nobs p

6 0.4198 0.15510 0.04012 0.0073

Jochen Ott TOTW Statistics Introduction and theta 9 / 52

Page 10: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Introduction; Counting Experiment

Possible Outcomes of a Hypothesis Test

There are two kinds of errors that can be made in the hypothesis test:

1 Rejecting H0 although it is true. This is the type-I error (or “error ofthe first kind”).

2 Not rejecting H0 although it is false; type-II error (or “error of thesecond kind”).

The first is usually regarded more severe. The probability for a type-I erroris the (“small”) threshold α used in the hypothesis test.

The type-II error is denoted β.The probability to (correctly) reject H0 if the alternative H1 is true is thepower (1− β).

Jochen Ott TOTW Statistics Introduction and theta 10 / 52

Page 11: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Introduction; Counting Experiment

Remarks

For a given fixed type-I error rate α, one would prefer the test withhigh power (1− β); this can be used to choose “optimal” teststatistic.

The p-value is not simply “the probability to observe the data asobserved”, but to “observe such data or more extreme data”. Thisdistinction is important; also, it implies that a comparison is neededbetween 2 datasets (real-valued test statistic).

The p-value is not the probability that the null hypothesis is true –such a statement carries no meaning in frequentist statistics, whereprobability always refers to (random) data and derived quantities,never to model parameters (but: Bayesian view differs, see below).

Rejecting the null hypothesis does not proof that the alternativehypothesis is true: Usually, many alternatives to the null hypothesisexist which are compatible with the observed observed data.

Jochen Ott TOTW Statistics Introduction and theta 11 / 52

Page 12: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Generalization; Shape Model

Contents

1 Introduction; Counting Experiment

2 Generalization; Shape Model

3 Test Statistic; MC method

4 Handling of Systematic Uncertainties

5 Test Statistic Definition with Nuisances

Jochen Ott TOTW Statistics Introduction and theta 12 / 52

Page 13: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Generalization; Shape Model

Introduction

So far: simplistic Poisson model with only one event count (“countingexperiment”).

Now: Generalize to a more realistic analysis in which the (binned) shape ofsome reconstructed mass distribution (Mrec) is analyzed to search for aresonance of unknown mass over some falling background.

Versions of such models are used in many channels of the Higgs bosonsearch at the LHC, but also many other searches.

Jochen Ott TOTW Statistics Introduction and theta 13 / 52

Page 14: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Generalization; Shape Model

Shape Model: Plot

For example, the expected Poisson means for background and data mightlook like this:

0 200 400 600 800 1000

Mrec

0

10

20

30

40

50

60

70

Eve

nts

per

bin

Data

Background

Signal, M = 500

Can this data be seen asevidence against thebackground-only model?

Can we exclude H0 =background only in favorof H1 = background +(scaled) M = 500signal?

Next steps: statisticalmodel, hypothesis test!

Jochen Ott TOTW Statistics Introduction and theta 14 / 52

Page 15: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Generalization; Shape Model

Shape Model: Plot

For example, the expected Poisson means for background and data mightlook like this:

0 200 400 600 800 1000

Mrec

0

10

20

30

40

50

60

70

Eve

nts

per

bin

Data

Background

Signal, M = 500

Can this data be seen asevidence against thebackground-only model?

Can we exclude H0 =background only in favorof H1 = background +(scaled) M = 500signal?

Next steps: statisticalmodel, hypothesis test!

Jochen Ott TOTW Statistics Introduction and theta 14 / 52

Page 16: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Generalization; Shape Model

Shape Model: Plot

For example, the expected Poisson means for background and data mightlook like this:

0 200 400 600 800 1000

Mrec

0

10

20

30

40

50

60

70

Eve

nts

per

bin

Data

Background

Signal, M = 500

Can this data be seen asevidence against thebackground-only model?

Can we exclude H0 =background only in favorof H1 = background +(scaled) M = 500signal?

Next steps: statisticalmodel, hypothesis test!

Jochen Ott TOTW Statistics Introduction and theta 14 / 52

Page 17: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Generalization; Shape Model

Shape Model: Plot

For example, the expected Poisson means for background and data mightlook like this:

0 200 400 600 800 1000

Mrec

0

10

20

30

40

50

60

70

Eve

nts

per

bin

Data

Background

Signal, M = 500

Can this data be seen asevidence against thebackground-only model?

Can we exclude H0 =background only in favorof H1 = background +(scaled) M = 500signal?

Next steps: statisticalmodel, hypothesis test!

Jochen Ott TOTW Statistics Introduction and theta 14 / 52

Page 18: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Generalization; Shape Model

Shape Model: Statistical Model

Probability to observe event counts ~n = (n1, n2, . . .) is a product ofPoisson probabilities in each bin:

p(~n|µ) =∏i

Poisson(ni |λi (µ)) with

λi (µ) = µsi + bi

where i = 1, . . . ,Nbins denotes the bin index. si and bi are the signal andbackground templates, resp., which are typically derived from Monte-Carlosimulation or from a background-enriched sideband.

µ ≥ 0 scales the signal template; it is the signal strength parameter. Apartfrom a (known constant) factor, it is the signal cross section.

Jochen Ott TOTW Statistics Introduction and theta 15 / 52

Page 19: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic; MC method

Contents

1 Introduction; Counting Experiment

2 Generalization; Shape Model

3 Test Statistic; MC method

4 Handling of Systematic Uncertainties

5 Test Statistic Definition with Nuisances

Jochen Ott TOTW Statistics Introduction and theta 16 / 52

Page 20: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic; MC method

The role of the Test Statistic t

Reminder: The p-value is defined as the probability to observe data atleast as extreme (signal-like) as the one actually observed.

For a counting experiment, more events are more “extreme”, more“signal-like”. In general, one has to summarize the “signal-likeness” in onereal number, this is the test statistic.

A possible choice is the (profile) likelihood ratio:

t = logmaxθ∈H1 L(θ|d)

maxθ∈H0 L(θ|d)

where we assume that the hypotheses H0 and H1 correspond to parametersub-spaces of a common stat. model; for searches: H0 is µ = 0 and H1 isµ > 0.

Again: t measures the compatibility with H0; large values meanincompatibility with H0, favoring H1.

Jochen Ott TOTW Statistics Introduction and theta 17 / 52

Page 21: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic; MC method

The role of the Test Statistic t

Reminder: The p-value is defined as the probability to observe data atleast as extreme (signal-like) as the one actually observed.

For a counting experiment, more events are more “extreme”, more“signal-like”. In general, one has to summarize the “signal-likeness” in onereal number, this is the test statistic.

A possible choice is the (profile) likelihood ratio:

t = logmaxθ∈H1 L(θ|d)

maxθ∈H0 L(θ|d)

where we assume that the hypotheses H0 and H1 correspond to parametersub-spaces of a common stat. model; for searches: H0 is µ = 0 and H1 isµ > 0.

Again: t measures the compatibility with H0; large values meanincompatibility with H0, favoring H1.

Jochen Ott TOTW Statistics Introduction and theta 17 / 52

Page 22: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic; MC method

The role of the Test Statistic t

Reminder: The p-value is defined as the probability to observe data atleast as extreme (signal-like) as the one actually observed.

For a counting experiment, more events are more “extreme”, more“signal-like”. In general, one has to summarize the “signal-likeness” in onereal number, this is the test statistic.

A possible choice is the (profile) likelihood ratio:

t = logmaxθ∈H1 L(θ|d)

maxθ∈H0 L(θ|d)

where we assume that the hypotheses H0 and H1 correspond to parametersub-spaces of a common stat. model; for searches: H0 is µ = 0 and H1 isµ > 0.

Again: t measures the compatibility with H0; large values meanincompatibility with H0, favoring H1.

Jochen Ott TOTW Statistics Introduction and theta 17 / 52

Page 23: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic; MC method

p-value definition via t

The p-value is the probability to observe t ≥ tobs if H0 is true:

p = Pr(t ≥ tobs|H0).

This suggests using a Monte-Carlo method for calculating the p-value:

1 Generate a large number of toy data distributed according to H0.

2 For each toy data, calculate test statistic t.

3 For the observed data, calculate test statistic tobs.

4 The p-value is given by the fraction of toys with t ≥ tobs; if p < α,reject H0.

The values of tobs for which H0 is rejected at level α is known as criticalregion in t.

Jochen Ott TOTW Statistics Introduction and theta 18 / 52

Page 24: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic; MC method

p-value definition via t

The p-value is the probability to observe t ≥ tobs if H0 is true:

p = Pr(t ≥ tobs|H0).

This suggests using a Monte-Carlo method for calculating the p-value:

1 Generate a large number of toy data distributed according to H0.

2 For each toy data, calculate test statistic t.

3 For the observed data, calculate test statistic tobs.

4 The p-value is given by the fraction of toys with t ≥ tobs; if p < α,reject H0.

The values of tobs for which H0 is rejected at level α is known as criticalregion in t.

Jochen Ott TOTW Statistics Introduction and theta 18 / 52

Page 25: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic; MC method

MC method for p: Example II

Result of 100,000 background-only toys: p = 86/105 Z = 3.13

0 2 4 6 8 10

t

10−1

100

101

102

103

104

105N

toys

per

bin

tobs = 4.92

N(t ≥ tobs) = 86

Jochen Ott TOTW Statistics Introduction and theta 19 / 52

Page 26: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Contents

1 Introduction; Counting Experiment

2 Generalization; Shape Model

3 Test Statistic; MC method

4 Handling of Systematic Uncertainties

5 Test Statistic Definition with Nuisances

Jochen Ott TOTW Statistics Introduction and theta 20 / 52

Page 27: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Introduction

In the statistical model, each uncertainty is included as an additionalparameter, called nuisance parameter.

Often, there is some external knowledge about the possible values of thosenuisance parameters.

Introducing systematic uncertainties requires:

Changes of the statistical model, i.e. how the nuisance parameterstypically affect the probability.

Changes of the significance calculation, i.e. how to include knowledgeabout nuisance parameters in the inference.

Those items will be discussed separately in the following slides.

Jochen Ott TOTW Statistics Introduction and theta 21 / 52

Page 28: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Example

In the counting experiment, assume that the expected background b = 5.2has some uncertainty (e.g. from limited statistics in a sideband). This canbe included by changing the statistical model to:

p(n|s, b) = Poisson(n|λ = b + s)

where b now is a nuisance parameter, not a constant.

We assume that there is external knowledge about b (e.g. from a sidebandmeasurement) suggesting that b is around b0 = 5.2 with some uncertainty∆b = 2.6.

Jochen Ott TOTW Statistics Introduction and theta 22 / 52

Page 29: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Changes to Significance Evaluation

We assume there is external knowledge about the nuisance parameters,which has to be incorporated in the procedure. Possible methods include:

1 Make it internal to the model, i.e., fit the nuisance parametersimultaneously with the parameter of interest (e.g. include sidebandin likelihood model).

2 Use Bayesian priors for the nuisance parameters and take prior-average

3 Include auxiliary measurements in the statistical model in anapproximate way and use bootstrapping.

Here, only item 2. is covered (see Backup for 3.).

Jochen Ott TOTW Statistics Introduction and theta 23 / 52

Page 30: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Changes to Significance Evaluation

We assume there is external knowledge about the nuisance parameters,which has to be incorporated in the procedure. Possible methods include:

1 Make it internal to the model, i.e., fit the nuisance parametersimultaneously with the parameter of interest (e.g. include sidebandin likelihood model).

2 Use Bayesian priors for the nuisance parameters and take prior-average

3 Include auxiliary measurements in the statistical model in anapproximate way and use bootstrapping.

Here, only item 2. is covered (see Backup for 3.).

Jochen Ott TOTW Statistics Introduction and theta 23 / 52

Page 31: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Changes to Significance Evaluation

We assume there is external knowledge about the nuisance parameters,which has to be incorporated in the procedure. Possible methods include:

1 Make it internal to the model, i.e., fit the nuisance parametersimultaneously with the parameter of interest (e.g. include sidebandin likelihood model).

2 Use Bayesian priors for the nuisance parameters and take prior-average

3 Include auxiliary measurements in the statistical model in anapproximate way and use bootstrapping.

Here, only item 2. is covered (see Backup for 3.).

Jochen Ott TOTW Statistics Introduction and theta 23 / 52

Page 32: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Bayesian vs. Frequentist “Probability”

Frequentist: “Probability is the relative frequency of a certainoutcome in an ensemble of (imaginary or real) repetitions of arandom process.”Probability is only assigned to “data” and derived quantities (teststatistic, p-value, etc.), but not to model parameters, which have onlyone true (unknown) value; the concept of probability does not apply.

Bayesian: “Probability can also be used to express the current stateof knowledge.”In particular, it is valid to talk about the probability that a modelparameter takes certain values.

Here, we discuss a “mixed”/“hybrid” method: Significance and p-valuesare a frequentist concept, but the use of nuisance parameter priors isBayesian.

Jochen Ott TOTW Statistics Introduction and theta 24 / 52

Page 33: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Bayesian vs. Frequentist “Probability”

Frequentist: “Probability is the relative frequency of a certainoutcome in an ensemble of (imaginary or real) repetitions of arandom process.”Probability is only assigned to “data” and derived quantities (teststatistic, p-value, etc.), but not to model parameters, which have onlyone true (unknown) value; the concept of probability does not apply.

Bayesian: “Probability can also be used to express the current stateof knowledge.”In particular, it is valid to talk about the probability that a modelparameter takes certain values.

Here, we discuss a “mixed”/“hybrid” method: Significance and p-valuesare a frequentist concept, but the use of nuisance parameter priors isBayesian.

Jochen Ott TOTW Statistics Introduction and theta 24 / 52

Page 34: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Bayesian vs. Frequentist “Probability”

Frequentist: “Probability is the relative frequency of a certainoutcome in an ensemble of (imaginary or real) repetitions of arandom process.”Probability is only assigned to “data” and derived quantities (teststatistic, p-value, etc.), but not to model parameters, which have onlyone true (unknown) value; the concept of probability does not apply.

Bayesian: “Probability can also be used to express the current stateof knowledge.”In particular, it is valid to talk about the probability that a modelparameter takes certain values.

Here, we discuss a “mixed”/“hybrid” method: Significance and p-valuesare a frequentist concept, but the use of nuisance parameter priors isBayesian.

Jochen Ott TOTW Statistics Introduction and theta 24 / 52

Page 35: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Prior-Averaging

Modify the Monte-Carlo method for p-value calculation: To generate toydata,

1 Draw a random value for each nuisance parameter from its prior.

2 Draw random data from the probability according to the stat. modelevaluated for those (random) parameter values.

Then, as usual: p-value is the fraction of toys in which t ≥ tobs.

Formally, the resulting p-value is:

pa =

∫θ

P(t > tobs|θ)π(θ)dθ

where π(θ) is the prior for the nuisance parameters θ.

Jochen Ott TOTW Statistics Introduction and theta 25 / 52

Page 36: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Example I: Counting Experiment

For a normal prior on b with mean b0 = 5.2 distribution for n changes:

0 2 4 6 8 10 12 14n

0.00

0.05

0.10

0.15

0.20

0.25p

b0 = 5.2

nobs = 8

p = 0.155

no uncertainty

In general: addingsystematic uncertainties“broadens” the teststatistic distribution,thus enlarging thep-value, reducing theZ -value.

Jochen Ott TOTW Statistics Introduction and theta 26 / 52

Page 37: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Example I: Counting Experiment

For a normal prior on b with mean b0 = 5.2 distribution for n changes:

0 2 4 6 8 10 12 14n

0.00

0.05

0.10

0.15

0.20

0.25p

b0 = 5.2

nobs = 8

p = 0.162

no uncertainty

uncertainty: 10.0%

In general: addingsystematic uncertainties“broadens” the teststatistic distribution,thus enlarging thep-value, reducing theZ -value.

Jochen Ott TOTW Statistics Introduction and theta 26 / 52

Page 38: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Handling of Systematic Uncertainties

Example I: Counting Experiment

For a normal prior on b with mean b0 = 5.2 distribution for n changes:

0 2 4 6 8 10 12 14n

0.00

0.05

0.10

0.15

0.20

0.25p

b0 = 5.2

nobs = 8

p = 0.195

no uncertainty

uncertainty: 30.0%

In general: addingsystematic uncertainties“broadens” the teststatistic distribution,thus enlarging thep-value, reducing theZ -value.

Jochen Ott TOTW Statistics Introduction and theta 26 / 52

Page 39: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic Definition with Nuisances

Contents

1 Introduction; Counting Experiment

2 Generalization; Shape Model

3 Test Statistic; MC method

4 Handling of Systematic Uncertainties

5 Test Statistic Definition with Nuisances

Jochen Ott TOTW Statistics Introduction and theta 27 / 52

Page 40: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic Definition with Nuisances

Test Statistic

Test Statistic t defined as ratio of profile likelihoods for null andalternative:

t = logmaxθ∈H1 L(θ|d)

maxθ∈H0 L(θ|d)= log

maxµ≥0 L(µ|d)

L(µ = 0|d).

Now, model parameters θ include the parameter of interest (signalstrength µ) and nuisance parameters θn: θ = (µ, θn). Change t:

1 Fix nuisance parameters to most probable value θn,0 in maximization,i.e. only vary µ:

t ′ = logmaxµ≥0 L(µ, θn = 0|d)

L(µ = 0, θn = 0|d)

2 Replace L with the posterior, i.e. multiply by the nuisance parameterprior π:

t = logmaxµ≥0,θn L(µ, θn|d)× π(θn)

maxθn L(µ = 0, θn|d)× π(θn)

Jochen Ott TOTW Statistics Introduction and theta 28 / 52

Page 41: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic Definition with Nuisances

Test Statistic

Test Statistic t defined as ratio of profile likelihoods for null andalternative:

t = logmaxθ∈H1 L(θ|d)

maxθ∈H0 L(θ|d)= log

maxµ≥0 L(µ|d)

L(µ = 0|d).

Now, model parameters θ include the parameter of interest (signalstrength µ) and nuisance parameters θn: θ = (µ, θn). Change t:

1 Fix nuisance parameters to most probable value θn,0 in maximization,i.e. only vary µ:

t ′ = logmaxµ≥0 L(µ, θn = 0|d)

L(µ = 0, θn = 0|d)

2 Replace L with the posterior, i.e. multiply by the nuisance parameterprior π:

t = logmaxµ≥0,θn L(µ, θn|d)× π(θn)

maxθn L(µ = 0, θn|d)× π(θn)

Jochen Ott TOTW Statistics Introduction and theta 28 / 52

Page 42: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic Definition with Nuisances

Test Statistic

Test Statistic t defined as ratio of profile likelihoods for null andalternative:

t = logmaxθ∈H1 L(θ|d)

maxθ∈H0 L(θ|d)= log

maxµ≥0 L(µ|d)

L(µ = 0|d).

Now, model parameters θ include the parameter of interest (signalstrength µ) and nuisance parameters θn: θ = (µ, θn). Change t:

1 Fix nuisance parameters to most probable value θn,0 in maximization,i.e. only vary µ:

t ′ = logmaxµ≥0 L(µ, θn = 0|d)

L(µ = 0, θn = 0|d)

2 Replace L with the posterior, i.e. multiply by the nuisance parameterprior π:

t = logmaxµ≥0,θn L(µ, θn|d)× π(θn)

maxθn L(µ = 0, θn|d)× π(θn)

Jochen Ott TOTW Statistics Introduction and theta 28 / 52

Page 43: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic Definition with Nuisances

Test Statistic and p-value

Both t ′ and t have been used in HEP analyses.

The definition of the test statistic is orthogonal to the definition of theensemble H0 used to define the p-value:For a MC method, this means it is crucial to vary the nuisance parametersin the toy data generation, while it’s not necessary to vary them in thedefinition of t.

Jochen Ott TOTW Statistics Introduction and theta 29 / 52

Page 44: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic Definition with Nuisances

Applying uncertainties

Using the prior averaging method means that toy data is drawn forrandom values for θu test statistic distribution is broadened:

0 2 4 6 8 10

t

10−1

100

101

102

103

104

105

Nto

ys

per

bin

tobs = 4.92

N(t ≥ tobs) = 86 No uncertainties:p = 0.00086, Z = 3.1.

With 10% rateuncertainty: p = 0.096,Z = 1.3.

Jochen Ott TOTW Statistics Introduction and theta 30 / 52

Page 45: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic Definition with Nuisances

Applying uncertainties

Using the prior averaging method means that toy data is drawn forrandom values for θu test statistic distribution is broadened:

0 2 4 6 8 10

t

10−1

100

101

102

103

104

105

Nto

ys

per

bin

tobs = 4.92

N(t ≥ tobs) = 9598 No uncertainties:p = 0.00086, Z = 3.1.

With 10% rateuncertainty: p = 0.096,Z = 1.3.

Jochen Ott TOTW Statistics Introduction and theta 30 / 52

Page 46: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Test Statistic Definition with Nuisances

Applying uncertainties

Using the prior averaging method means that toy data is drawn forrandom values for θu test statistic distribution is broadened:

0 2 4 6 8 10

t

10−1

100

101

102

103

104

105

Nto

ys

per

bin

tobs = 4.92

N(t ≥ tobs) = 5173 No uncertainties:p = 0.00086, Z = 3.1.

With 10% rateuncertainty: p = 0.096,Z = 1.3.

Jochen Ott TOTW Statistics Introduction and theta 30 / 52

Page 47: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Part II

Confidence Intervals

Jochen Ott TOTW Statistics Introduction and theta 31 / 52

Page 48: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Introduction and Definitions

Confidence intervals are probabilistic statement about the value ofparameters of a statistical model. They are calculated at a givenconfidence level which specifies the (claimed/desired) coverage.

The coverage is the probability that the interval contains the true value. Ingeneral, the coverage is a function of the true parameter values. (Notethat this does not assign probabilities to the true value but only to theinterval construction!)

A method is said to over-cover (and conservative) if the coverage is abovethe confidence level; the opposite is under-coverage. Sometimes, exactcoverage cannot be reached (e.g. due to discrete data); in this case oneusually chooses to construct the method to over-cover.

Jochen Ott TOTW Statistics Introduction and theta 32 / 52

Page 49: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Introduction and Definitions

Confidence intervals are probabilistic statement about the value ofparameters of a statistical model. They are calculated at a givenconfidence level which specifies the (claimed/desired) coverage.

The coverage is the probability that the interval contains the true value. Ingeneral, the coverage is a function of the true parameter values. (Notethat this does not assign probabilities to the true value but only to theinterval construction!)

A method is said to over-cover (and conservative) if the coverage is abovethe confidence level; the opposite is under-coverage. Sometimes, exactcoverage cannot be reached (e.g. due to discrete data); in this case oneusually chooses to construct the method to over-cover.

Jochen Ott TOTW Statistics Introduction and theta 32 / 52

Page 50: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Counting Experiment

Consider the simple counting model with b = 5.2 and unknown s ≥ 0 with

p(n|s) = Poisson(n|s + b).

For a given observation (e.g. nobs = 5), what statement can be madeabout s?For example, we would expect to rule out large values for s, e.g. s = 100.

This is a question for a hypothesis test.

Jochen Ott TOTW Statistics Introduction and theta 33 / 52

Page 51: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Counting Experiment

Consider the simple counting model with b = 5.2 and unknown s ≥ 0 with

p(n|s) = Poisson(n|s + b).

For a given observation (e.g. nobs = 5), what statement can be madeabout s?For example, we would expect to rule out large values for s, e.g. s = 100.

This is a question for a hypothesis test.

Jochen Ott TOTW Statistics Introduction and theta 33 / 52

Page 52: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Intervals and Hypothesis Tests

Upper limit construction by hypothesis test “inversion”:

1 For a given s = s0, make a hypothesis test with the null hypothesiss = s0 and the alternative s < s0 with type-I error α (e.g., α = 0.05).

2 Repeat step 1 for different values of s0.

3 The confidence interval for s comprises exactly those values s0 forwhich the hypothesis test could not reject the null hypothesis s = s0.

For this formulation of the hypothesis test (s = s0 vs. s < s0), we get anupper limit.

The confidence level is (1− α) (here: 95%).

This is known as the Neyman Construction. It can be visualized as “beltconstruction” on the (n–s) plane.

Jochen Ott TOTW Statistics Introduction and theta 34 / 52

Page 53: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Belt

The Neyman construction as “belt” in the s–nobs plane: For each s, the“belt” in nobs has probability ≥ 1− α. For given nobs, the interval is givenby intersecting the horizontal line with the belt.

0 2 4 6 8 10 12 14s

0

5

10

15

20

25

30

nobs

p = α

Jochen Ott TOTW Statistics Introduction and theta 35 / 52

Page 54: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Belt

The Neyman construction as “belt” in the s–nobs plane: For each s, the“belt” in nobs has probability ≥ 1− α. For given nobs, the interval is givenby intersecting the horizontal line with the belt.

0 2 4 6 8 10 12 14s

0

5

10

15

20

nobs

nobs = 8

Jochen Ott TOTW Statistics Introduction and theta 35 / 52

Page 55: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Comment

The close relationship with hypothesis testing means many aspects fromHT also apply to limits:

explicitly introduce test statistic for shape models; again: Teststatistic choice not unique, can use profile likelihood ratio t

use toy Monte-Carlo to get test statistic distribution for H0

handling of systematic uncertainties via Bayesian/hybrid method

use of asymptotic methods

concept of “expected limit” by running the method (imaginatively orwith MC) on an ensemble representing the expected data (usuallybackground-only)

. . .

Jochen Ott TOTW Statistics Introduction and theta 36 / 52

Page 56: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Extensions

So far, discussed Neyman construction for limits. Modifications:

Modify “belt” construction: for each s, include the “central” set ofnobs into belt “central intervals”, which are two-sided

Use likelihood ratio as ordering criterion to decide what to include inthe belt Feldman-Cousins “unified intervals” with smoothtransition between limit-like and central-like intervals

Purely frequentist methods might lead to empty intervals modifyfrequentist p-values by a “penalty factor” which is high ifexperimental sensitivity is low modified frequentist CLs limits

Jochen Ott TOTW Statistics Introduction and theta 37 / 52

Page 57: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Part III

theta

Jochen Ott TOTW Statistics Introduction and theta 38 / 52

Page 58: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

6 Statistical Model

7 Using theta

Jochen Ott TOTW Statistics Introduction and theta 39 / 52

Page 59: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Statistical Model

Contents

6 Statistical Model

7 Using theta

Jochen Ott TOTW Statistics Introduction and theta 40 / 52

Page 60: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Statistical Model

Overview

theta is a program for statistical inference. It only handles binned modelsthat use histograms for the prediction such as the example:

0 200 400 600 800 1000

Mrec

0

10

20

30

40

50

60

70

Eve

nts

per

bin

Data

Background

Signal, M = 500Important limits: onebin = countingexperiment; manynarrow bins almostequivalent to unbinnedcase

Jochen Ott TOTW Statistics Introduction and theta 41 / 52

Page 61: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Statistical Model

Formal Shape Model

The statistical model is the product of Poisson in each bin:

p(n|θ) =∏i

Poisson(ni |λi (θ))

where i is the bin index and the expected number of events in bin i , λi , isgiven by the sum of (scaled) signal and background histograms:

λi (θ) = µsi +∑p

cp(θn)bpi (θn).

p denotes the different background processes which are expected tocontribute. The bin-independent coefficient cp(θn) encodes(process-specific) rate uncertainties, while bpi (θn) is the most generaldependence, a shape uncertainty.

Jochen Ott TOTW Statistics Introduction and theta 42 / 52

Page 62: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Statistical Model

Rate Uncertainties: Example

In the shape example model, we might estimate a 10% uncertainty on theoverall rate of the background:

0 200 400 600 800 1000

Mrec

0

10

20

30

40

50

60

70

Eve

nts

per

bin

Data

Background

Signal, M = 500

Have “plus” and“minus” templates byscaling the “nominaltemplate up and downby 10%. introduce nuisanceparameter which scalesthe template accordingly(see backup forimplementation).

Jochen Ott TOTW Statistics Introduction and theta 43 / 52

Page 63: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Statistical Model

Rate Uncertainties: Example

In the shape example model, we might estimate a 10% uncertainty on theoverall rate of the background:

0 200 400 600 800 1000

Mrec

10

20

30

40

50

60

70

Eve

nts

per

bin

nominal

minus

plus

Have “plus” and“minus” templates byscaling the “nominaltemplate up and downby 10%. introduce nuisanceparameter which scalesthe template accordingly(see backup forimplementation).

Jochen Ott TOTW Statistics Introduction and theta 43 / 52

Page 64: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Statistical Model

Rate Uncertainties: Example

In the shape example model, we might estimate a 10% uncertainty on theoverall rate of the background:

0 200 400 600 800 1000

Mrec

10

20

30

40

50

60

70

Eve

nts

per

bin

nominal

minus

plus

Have “plus” and“minus” templates byscaling the “nominaltemplate up and downby 10%. introduce nuisanceparameter which scalesthe template accordingly(see backup forimplementation).

Jochen Ott TOTW Statistics Introduction and theta 43 / 52

Page 65: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Statistical Model

Shape Uncertainty: Example

In the shape model, assume you have an uncertainty affecting thebackground shape like this:

0 200 400 600 800 1000

Mrec

10

20

30

40

50

60

70

80

Eve

nts

per

bin

nominal

minus

plus

Can introduce thenuisance parameter anduse it in the model tointerpolate smoothlybetween the threetemplates, if assumingthe “plus” and “minus”correspond to a ±1σuncertainty (see backupfor details)

Jochen Ott TOTW Statistics Introduction and theta 44 / 52

Page 66: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Using theta

Contents

6 Statistical Model

7 Using theta

Jochen Ott TOTW Statistics Introduction and theta 45 / 52

Page 67: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Using theta

theta Overview

theta is a software package for making statistical inference usingtemplate-based models.

The “core” of theta is a (heavily optimized) C++ program, controlledby a text-based cfg-file, writing the result to a sqlite database or aroot tree.

A python interface “theta-auto” can be used for many cases whichmakes usage easier than writing the cfg-file manually.

Here: concentrate on the python interface.Usually, the script has two separate steps: Building the statistical model(as class Model), and then applying different statistical methods.

Jochen Ott TOTW Statistics Introduction and theta 46 / 52

Page 68: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Using theta

Building Models I

Formally, the general statistical model in theta is

p(n|θ) =∏i

Poisson(ni |λi (θ))

where i is the bin index and the mean expected number of events in bin i ,λi , is given by the sum of (scaled) signal and background histograms:

λi (θ) = µsi (θ) +∑p

cp(θn)bpi (θn).

where p runs over the background processes.

The python interface supports log-normal coefficients for cp and thetemplate morphing for bpi (θn), as discussed earlier. theta also supportshandling MC statistical uncertainties with the “Barlow-Beeston light”method.

Jochen Ott TOTW Statistics Introduction and theta 47 / 52

Page 69: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Using theta

Building Models II

In theta-auto, models can be built e.g. from a text-based “datacard” (asused for “combine”), which specifies the model and the observed data:

imax 1 number of channelsjmax 3 number of backgroundskmax 2 number of nuisance parametersbin 1observation 0bin 1 1 1 1process ggH qqWW ggWW othersprocess 0 1 2 3rate 1.47 0.63 0.06 0.22lumi lnN 1.11 − 1.11 −

xs ggWW lnN − − 1.50 −

Note that it is possible to include template morphing by refering tohistograms in a root file. See SWGuideHiggsAnalysisCombinedLimit fordetails (CMS-internal).

Jochen Ott TOTW Statistics Introduction and theta 48 / 52

Page 70: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Using theta

Method Overview

Once the statistical model is built, one can use it to

perform a maximum likelihood fit to get estimates for all modelparameters

compute profile likelihood intervals

compute toy-based or asymptotic CLs limits (observed and expected)

evaluate the posterior using Markov-Chain Monte-Carlo (Bayesianmethod for intervals and limits)

. . .

Jochen Ott TOTW Statistics Introduction and theta 49 / 52

Page 71: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Using theta

Example: Maximum Likelihood Estimate

Most methods are implemented as a single python function, taking themodel, the “input” dataset and the number of evaluations as input. Tomake the fit on data, use

res = mle(model, ’data’ , 1)print res

To make the fit on 1000 toys with µ = 1.2, use:

res = mle(model, ’toys :1.2 ’ , 1000)# do some calculations with res

where the “calculations” could e.g. study the bias and pull of themaximum likelihood estimate.

More information on theta: http://theta-framework.org/.

Jochen Ott TOTW Statistics Introduction and theta 50 / 52

Page 72: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Using theta

Comments on Maximum Likelihood

The Maximum likelihood estimate has some nice asymptotic properties:

The bias goes to zero

Among the unbiased estimators, it has the smallest variance

Can use curvature of negative log-likelihood to estimate variance

Maximum Likelihood is the “default choice” for an estimator in HEP.

Jochen Ott TOTW Statistics Introduction and theta 51 / 52

Page 73: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Using theta

Summary

Introduced concepts in one sentence:

Hypothesis tests: The p-value is the probability to observe at least as“extreme” data.

Test Statistic is the measure of “extreme”.

Neyman Interval construction as an “inversion” of the hypothesistest: Include those points µ0 in the interval for which we can notreject the null hypothesis µ = µ0.

Handle rate (shape) uncertainties in the statistical model byintroducing a nuisance parameter scaling (interpolating) the template

Jochen Ott TOTW Statistics Introduction and theta 52 / 52

Page 74: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Part IV

Backup

Jochen Ott TOTW Statistics Introduction and theta 53

Page 75: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

8 Misc

9 Frequentist Interpretation; Bootstrapping

10 Rate Uncertainty Implementation

11 Shape Uncertainties

12 BB light method

Jochen Ott TOTW Statistics Introduction and theta 54

Page 76: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Misc

Contents

8 Misc

9 Frequentist Interpretation; Bootstrapping

10 Rate Uncertainty Implementation

11 Shape Uncertainties

12 BB light method

Jochen Ott TOTW Statistics Introduction and theta 55

Page 77: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Misc

p-value Distribution for Discrete Data

The p-value is defined to observe at least as extreme data for H0.

If the data in the statistical model is discrete, the p-value can’t follow aproper uniform distribution on [0, 1].

In general (also for discrete data):

Pr(p ≤ p0|H0) ≤ p0 for 0 ≤ p0 ≤ 1.

In words: The probability to observe a p-value below some threshold p0 isat most p0 (and if equality holds, p-value is indeed uniform on [0, 1]).

Jochen Ott TOTW Statistics Introduction and theta 56

Page 78: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Misc

“Expected” vs. “Observed” Result

Consider two counting experiments A, B searching for the same signal,both expecting background b = 100, and signals sA = 20 and sB = 15,clearly indicating a better performance for A.

By chance, nA = 120, sB = 120, giving ZA ≈ 1 and ZB ≈ 1.3, so using the“observed” significance, experiment B is “better”, which is of coursenonsense.

Related issue in the statement: “Experiment A sees 3σ effect, experimentB sees 3.5σ effect, so in summary we have a 3.5σ effect” (or similarstatement for limits).However, this is wrong from the statistical point of view, as minimum oftwo p-values is not a proper p-value (refer to look-elsewhere effect). solution: Always use expected significance and decide whichanalysis/experiment to use without using the (random) data result.

Jochen Ott TOTW Statistics Introduction and theta 57

Page 79: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Misc

“Expected” vs. “Observed” Result

Consider two counting experiments A, B searching for the same signal,both expecting background b = 100, and signals sA = 20 and sB = 15,clearly indicating a better performance for A.

By chance, nA = 120, sB = 120, giving ZA ≈ 1 and ZB ≈ 1.3, so using the“observed” significance, experiment B is “better”, which is of coursenonsense.

Related issue in the statement: “Experiment A sees 3σ effect, experimentB sees 3.5σ effect, so in summary we have a 3.5σ effect” (or similarstatement for limits).However, this is wrong from the statistical point of view, as minimum oftwo p-values is not a proper p-value (refer to look-elsewhere effect). solution: Always use expected significance and decide whichanalysis/experiment to use without using the (random) data result.

Jochen Ott TOTW Statistics Introduction and theta 57

Page 80: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Frequentist Interpretation; Bootstrapping

Contents

8 Misc

9 Frequentist Interpretation; Bootstrapping

10 Rate Uncertainty Implementation

11 Shape Uncertainties

12 BB light method

Jochen Ott TOTW Statistics Introduction and theta 58

Page 81: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Frequentist Interpretation; Bootstrapping

Model Reminder

The observed data can be summarized as the number of observed eventsn. The probability to observe n events is given by a Poisson probability:

p(n|θ) = Poisson(n|λ(θ)).

The Poisson mean λ(θ) is given by the sum of (scaled) signal andbackground yields,

λi (θ) = µs + b(θn),

where the model parameters θ comprise the signal strength parameter µand the nuisance parameters θn: θ = (µ, θn).

External knowledge about the nuisance parameters is encoded in the priorπ(θn).

Jochen Ott TOTW Statistics Introduction and theta 59

Page 82: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Frequentist Interpretation; Bootstrapping

Frequentist Interpretation

The Bayesian posterior f is given by the likelihood times the prior(assumed to be normal here):

f (θ) = L(θ|d)×N (θn),

(apart from an unimportant normalization). f (θ) can be used in place ofplain L at many places (e.g. parameter estimation, definition of t).

Frequentist re-interpretation:

f (θ) ∝ L(θ|d)×∏u

e−

(θu−µu )2

2δ2u

where µu = 0 and δu = 1, u runs over all nuisance parameters.This can be interpreted as the likelihood function of a slightly differentmodel by swapping θu and µu. The µu now are random variables, part ofthe data. The data comprise the number of observed events and thevalues for µu (with µu = 0 for the observed data).

Jochen Ott TOTW Statistics Introduction and theta 60

Page 83: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Frequentist Interpretation; Bootstrapping

Frequentist Interpretation

The Bayesian posterior f is given by the likelihood times the prior(assumed to be normal here):

f (θ) = L(θ|d)×N (θn),

(apart from an unimportant normalization). f (θ) can be used in place ofplain L at many places (e.g. parameter estimation, definition of t).

Frequentist re-interpretation:

f (θ) ∝ L(θ|d)×∏u

e−

(θu−µu )2

2δ2u

where µu = 0 and δu = 1, u runs over all nuisance parameters.This can be interpreted as the likelihood function of a slightly differentmodel by swapping θu and µu. The µu now are random variables, part ofthe data. The data comprise the number of observed events and thevalues for µu (with µu = 0 for the observed data).

Jochen Ott TOTW Statistics Introduction and theta 60

Page 84: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Frequentist Interpretation; Bootstrapping

Frequentist Interpretation: Comments

No longer need (Bayesian) concept of prior for model parameters θu;instead, have extended the data by µu.

Allows to use purely frequentist concepts for defining ensembles oftoys data; but: requires to choose parameter values.

Choose parameter values by fitting to data: “(parametric)bootstrapping”.

If want to keep structure for f , have to use conjugate distribution forµu in the frequentist model. Normal distribution is self-conjugate use normal model for distribution of µu.

Jochen Ott TOTW Statistics Introduction and theta 61

Page 85: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Frequentist Interpretation; Bootstrapping

Updated Monte-Carlo Method for p-value

For the p-value calculation with Monte-Carlo, the steps are modified:

1 Make a maximum likelihood fit to data (with null hypothesis H0) toget estimates for nuisance parameters θu, θu.

2 Generate toy data by sampling from the model at the fitted values forθu; in particular, draw a Gaussian for µu around θu with width 1.

For each toy data, calculate the test statistic value, e.g. using the t ′ or tdefinitions. The fraction of toys for which t ≥ tobs is the p-value.

Jochen Ott TOTW Statistics Introduction and theta 62

Page 86: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Frequentist Interpretation; Bootstrapping

Summary; Comments

The expression for the posterior can be interpreted purely frequentist wayof a slightly different statistical model with an extended dataset. For thatmodel, can apply parametric bootstrapping and proceed with a purelyfrequentist framework.

Notes:

The frequentist approach allows the application of asymptoticformulas

This is the method used in the LHC Higgs combination.

Jochen Ott TOTW Statistics Introduction and theta 63

Page 87: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Rate Uncertainty Implementation

Contents

8 Misc

9 Frequentist Interpretation; Bootstrapping

10 Rate Uncertainty Implementation

11 Shape Uncertainties

12 BB light method

Jochen Ott TOTW Statistics Introduction and theta 64

Page 88: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Rate Uncertainty Implementation

Rate Uncertainties: Implementation

In the statistical model, had expression for Poisson mean

λi (θ) = µsi +∑p

cp(θn)bpi (θn).

Rate uncertainties can be included in the coefficient cp, e.g. by using

cp(θu) = θu

where θu has a log-normal prior around 1 (see Backup.). Equivalently, wecan also use:

cp(θu) = eθu∆b

where the nuisance parameter θu has a normal prior around 0 with standarddeviation 1, which corresponds to a scale factor with a log-normal prior.

Jochen Ott TOTW Statistics Introduction and theta 65

Page 89: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Rate Uncertainty Implementation

Rate Uncertainties: Implementation

In the statistical model, had expression for Poisson mean

λi (θ) = µsi +∑p

cp(θn)bpi (θn).

Rate uncertainties can be included in the coefficient cp, e.g. by using

cp(θu) = θu

where θu has a log-normal prior around 1 (see Backup.). Equivalently, wecan also use:

cp(θu) = eθu∆b

where the nuisance parameter θu has a normal prior around 0 with standarddeviation 1, which corresponds to a scale factor with a log-normal prior.

Jochen Ott TOTW Statistics Introduction and theta 65

Page 90: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Rate Uncertainty Implementation

Equivalent Parametrizations

We just saw two different, but equivalent methods to introduce alog-normal scale factor.

This is an example of a more general principle:The statistical model can be re-parametrized, which changes both the“model response” to the nuisance parameter and the parameter prior.

Using this freedom, one can use independent standard normal priors forthe nuisance parameters, which will be assumed from now on.

Jochen Ott TOTW Statistics Introduction and theta 66

Page 91: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Rate Uncertainty Implementation

Rate Uncertainties: Normal vs. log-normal I/II

The uncertainty on b was implemented by using the stat. model

p(n|s, b) = Poisson(λ = s + b)

with a normal prior for b around known b0 with known width ∆b.

But: λ can become negative with non-zero probability, for which a Poissonis not defined.

Instead, use a log-normal prior for a scale factor for b0:

λ(s, f ) = s + f · b0

where f has a log-normal prior, i.e., log f has a normal distribution.An equivalent formulation is

λ(s, θ) = s + eθ log(1+∆b)b0

where the n.p. θ has a normal prior with mean 0 and standard deviation 1.Jochen Ott TOTW Statistics Introduction and theta 67

Page 92: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Rate Uncertainty Implementation

Rate Uncertainties: Normal vs. log-normal II/II

Comparing the prior for the scale factor between normal and log-normal:∆b = 0.1:

0.0 0.5 1.0 1.5 2.0 2.5

f

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5p

normal

log-normal

Log-normal avoidsunphysical jump /truncation at f = 0,while normal priorrequires that for large∆b.

Jochen Ott TOTW Statistics Introduction and theta 68

Page 93: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Rate Uncertainty Implementation

Rate Uncertainties: Normal vs. log-normal II/II

Comparing the prior for the scale factor between normal and log-normal:∆b = 0.3:

0.0 0.5 1.0 1.5 2.0 2.5

f

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6p

normal

log-normal

Log-normal avoidsunphysical jump /truncation at f = 0,while normal priorrequires that for large∆b.

Jochen Ott TOTW Statistics Introduction and theta 68

Page 94: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Rate Uncertainty Implementation

Rate Uncertainties: Normal vs. log-normal II/II

Comparing the prior for the scale factor between normal and log-normal:∆b = 1.0:

0.0 0.5 1.0 1.5 2.0 2.5

f

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8p

normal

log-normal

Log-normal avoidsunphysical jump /truncation at f = 0,while normal priorrequires that for large∆b.

Jochen Ott TOTW Statistics Introduction and theta 68

Page 95: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Shape Uncertainties

Contents

8 Misc

9 Frequentist Interpretation; Bootstrapping

10 Rate Uncertainty Implementation

11 Shape Uncertainties

12 BB light method

Jochen Ott TOTW Statistics Introduction and theta 69

Page 96: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Shape Uncertainties

Shape Uncertainties: Introduction

Can have uncertainties also affecting shape in a general way (e.g. byenergy calibration, . . . ).

Typically, in an analysis, one would

Use MC sample (or sideband) to get a shape for a process “nominaltemplate”

Modify the MC (or sideband) to get “±1σ effects” of someuncertainty, e.g. by re-weighting events, modifying the energycalibration, using a different sideband, . . . “plus”/“minus” template

Jochen Ott TOTW Statistics Introduction and theta 70

Page 97: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Shape Uncertainties

Shape Uncertainties: Statistical Model

Follow general recipe: introduce nuisance parameter θu with standardnormal prior, and write the model prediction for the Poisson mean λi as afunction of the new parameter.

It should interpolate smoothly between the “minus” template for θu = −1,the “nominal” template at θu = 0 and the “plus” template at θu = +1.

There are many possibilities to achieve this; here: Use cubic interpolationfor |θu| < 1 and linear extrapolation for |θu| > 1.

Jochen Ott TOTW Statistics Introduction and theta 71

Page 98: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Shape Uncertainties

Shape Uncertainties: Single Bin Behavior

Example interpolation for the bin around Mrec = 835:

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0

θu

18

19

20

21

22

23

24

bpi(θ

u)

bipu+ = 21.9

bipu− = 19.3

bip0 = 20.2

bipu± are the bincontents for the “plus”and “minus” templates;bip0 for the “nominal”.

Jochen Ott TOTW Statistics Introduction and theta 72

Page 99: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Shape Uncertainties

Shape Uncertainties: Shape Behavior

Applying template morphing for certain values θu, the backgroundtemplate looks like this:

0 200 400 600 800 1000

Mrec

10

20

30

40

50

60

70

80

90

Eve

nts

per

bin

nominal θu = 0.0

minus θu = −1.0

plus θu = 1.0

Interpolation agrees withintuitive expectation.

Jochen Ott TOTW Statistics Introduction and theta 73

Page 100: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

Shape Uncertainties

Shape Uncertainties: Shape Behavior

Applying template morphing for certain values θu, the backgroundtemplate looks like this:

0 200 400 600 800 1000

Mrec

10

20

30

40

50

60

70

80

90

Eve

nts

per

bin

nominal

θu = −0.5

θu = 2.0

minus

plusInterpolation agrees withintuitive expectation.

Jochen Ott TOTW Statistics Introduction and theta 73

Page 101: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

BB light method

Contents

8 Misc

9 Frequentist Interpretation; Bootstrapping

10 Rate Uncertainty Implementation

11 Shape Uncertainties

12 BB light method

Jochen Ott TOTW Statistics Introduction and theta 74

Page 102: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

BB light method

Introduction

If searching for a small signal using Monte-Carlo (or the data in asideband) to define the statistical model, the limited size of theMonte-Carlo sample might introduce an additional uncertainty on thepredicted mean λci .

Example: counting experiment in which only NMC = 1 background eventsurvives, weighted to b = 0.2, and n = 3 events are observed. If b = 0.2was known exactly, this would correspond to a 3σ effect, but it is possiblethat the true mean corresponds to NMC = 2 (b = 0.4), which correspondsto a much smaller significance . . .

Jochen Ott TOTW Statistics Introduction and theta 75

Page 103: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

BB light method

Barlow-Beeston Method

Idea of Barlow and Beeston (Comp. Phys. Comm. 77 (1993)):

Introduce the true mean in bin i as nuisance parameter ǫi in thestatistical model

Extend the statistical model to model the joint probability to observeni data events and NMC,i Monte-Carlo events, given ǫi (both arebasically Poisson)

For maximum likelihood methods, note that maximizing w.r.t. ǫileads to de-coupled equations that can be solved numerically

This is implemented in ROOT’s TFractionFitter.

Disadvantage: Need to numerically solve equations at each step in theminimization.

Jochen Ott TOTW Statistics Introduction and theta 76

Page 104: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

BB light method

Modification: Barlow-Beeston light

Same idea: add one nuisance parameter δci per bin to model MC stat.uncertainty:

µci (θ, δ) = µci (θ) + δci

where δci here has a normal prior (with mean 0).

This can be implemented using template morphing: Add Nbins nuisanceparameters δci , each shifting exactly one bin w.r.t. the nominal template.But: can lead to hundreds of nuisance parameters numericalminimization slow, unstable.

Jochen Ott TOTW Statistics Introduction and theta 77

Page 105: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

BB light method

Analytical Solution

To maximize the Poisson likelihood function L(θ, δ|n), set all derivatives tozero and solve for the parameters δci analytical solution δm(θ)!

Then, the actual maximization algorithm is run on the likelihood functionLm in which the δ parameters have been “maximized away”:

Lm(θ|d) := L(θ, δm(θ)|d)

Allows to handle MC stat. uncertainties without introducing anadditional “visible” nuisance parameter.

Jochen Ott TOTW Statistics Introduction and theta 78

Page 106: Practical Statistics - indico.cern.ch€¦ · Practical Statistics Jochen Ott June 5, 2014 Jochen Ott TOTW Statistics Introduction and theta 1 / 52

BB light method

Comments

BB light uses normal approximation need enough MC events in eachbin.

Possible work-arounds in case of low MC statistics include:

re-bin to reach a minimum number of MC events

use a non-parametric smoothing procedure (neighbor bins)

fit a function (“parametric smoothing”)

Often, first option is the easiest one, but: could loose sensitivity w.r.t.others.

Jochen Ott TOTW Statistics Introduction and theta 79