detections limits, u pp er limits, and c onÞdence i n t erv a ls in …hea- · 2006. 8. 21. ·...

Detections Limits, Upper Limits,

and Confidence Intervals

in High-Energy Astrophysics

David van Dyk

Department of StatisticsUniversity of California, Irvine

Joint work withVinay Kashyap, The CHAS Collaboration, and SAMSI SaFeDe Working Group

1

Outline of Presentation

This talk has two parts.

1. Soap Box

• What do short Confidence Intervals Mean?

• Is there an all-purpose statistical solution?

• Why compute Confidence Intervals at all?

2. Illustrations from High-Energy Astro-statistics

• Detection and Detection Limits

• Upper Limits for undetected sources

• Quantifying uncertainty for detected sources

• Nuisance parameters

2

What is a Frequency Confidence Interval?

Background contaminated Poisson

N ∼ Poisson(µ + b),

with b = 2.88 and µ = 1.25.

• Confidence Interval:

{µ : N ∈ I(µ)},

where Pr(N ∈ I(µ)|µ) ≥ 95%.

• Values of µ with propenstity togenerate observed N .

Sampling Distribution of 95% CI

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

N = 0

N = 1

N = 2

N = 3

N = 4

N = 5

N = 6

N = 7N = 8 N > 8

confidence interval for µ

cum

ulat

ive

prob

abili

ty

The CI gives values of µ that are plausible given the observed data.

3

Short or Empty Frequency Confidence Intervals

What do they mean?

There are few values of µ that are plausible given the observed data.

What they do not mean?

The experimental uncertainty is small. (Std Error or Risk of µ̂??)

What if CI are repeatedly short or empty?

Short intervals should be an uncommon occurrence. If they are not

the statistical model is likely wrong, regardless of the strength of the

subjective prior belief in the model.

4

Pre-Data and Post-Data Probability

• A Frequency Confidence Interval has (at least) a 95% chance of covering thetrue value of µ.

• What is the chance that ∅ contains µ?

There is a 95% chance that Confidence Intervals

computed in this way contain the true value of µ.

• Frequency-based probability says nothing about a particular interval.

• Bayesian methods are better suited to quantify this type of probability.

• Precise probability statements may not be the most relevant probabilitystatements.

Our intuition leads us to condition on the observed data.

5

Leaning on the Science Rather than on the Statistics

Desirable Properties of Confidence Intervals Include (Mandelkern, 2002):

1. Based on well-define principles that are neither arbitrary nor subjective

2. Do not depend on prior or outside knowledge about the parameter

3. Equivariant

4. Convey experimental uncertainty

5. Correspond to precise probability statements

But....

1. Mightn’t there be physical reasons to prefer a parameterization over another?

2. Why so much concern over using outside information regarding the values ofthe parameter but none over the form of the statistical model?

Is it easier to make a final decision on the choice of CI than on the

choice of parameterization?

6

Say What You Mean, Mean What You Say

• Confidence Intervals are not designed to represent experimental uncertainty.

• Rather than trying to use Confidence Intervals for this purpose, an estimatorthat is designed to represent experimental uncertainty should be used.

• For example, we might estimate the risk of an estimator.

Risk(µ, µ̂) = Eµ

[Loss(µ, µ̂)

]=

∞∑

N=0

Loss(µ, µ̂(N))f(N |µ)

7

Why Use Confidence Intervals?

• The Guassian distribution canbe fully represented by two pa-rameters, the mean and the stan-dard deviation.

• No information is lost if we re-port a simple central 95% prob-ability interval.

• The CLT ensures that for largesamples many likelihoods andposterior distributions are ap-proximately Gaussian.

mean

SD

• The bell-shape of the Gaussian dis-tribution gives a simple interpreta-tion to such intervals.

• Analogous reasoning yields a largesample interpretation for FrequencyConfidence Intervals.

The 95% Confidence Interval:

A story of Gaussian sampling and posterior distributions.

8

But what if....

• How do we summarize this distribution?

• Why do we want to? What is the benefit?

• With more complex distributions, simple summaries are not available.

Solution: Do Not Summarize!!

9

Multiple Modes

0 1 2 3 4 5 6

02

46

812

Energy (keV)

Post

erio

r pro

babi

lity

dens

ity fu

nctio

n obs!id 47

0 1 2 3 4 5 6

02

46

812

Energy (keV)Po

ster

ior p

roba

bilit

yde

nsity

func

tion obs!id 62

0 1 2 3 4 5 6

02

46

812

Energy (keV)

Post

erio

r pro

babi

lity

dens

ity fu

nctio

n obs!id 69

0 1 2 3 4 5 6

02

46

812

Energy (keV)

Post

erio

r pro

babi

lity

dens

ity fu

nctio

n obs!id 70

0 1 2 3 4 5 6

02

46

812

Energy (keV)

Post

erio

r pro

babi

lity

dens

ity fu

nctio

n obs!id 71

0 1 2 3 4 5 6

02

46

812

Energy (keV)Po

ster

ior p

roba

bilit

yde

nsity

func

tion obs!id 1269

An Interval is not a helpful summary!

10

A False Sense of Confidence

versus [2.03, 7.87]

• Confidence Intervals appear to give a definite range of plausible values.

• Plots seem wishy-washy.

• But the likelihood (or posterior distribution) contains full information.

Confidence Intervals Give a False Sense of Confidence.

11

Outline of Presentation

This talk has two parts.

1. Soap Box

• What do short Confidence Intervals Mean?

• Is there an all-purpose statistical solution?

• Why compute Confidence Intervals at all?

2. Illustrations from High-Energy Astro-statistics

• Detection and Detection Limits

• Upper Limits for undetected sources

• Quantifying uncertainty for detected sources

• Nuisance Parameters

12

A Survey

• We observe a field of optical sources with the Chandra X-ray Observatory.

• There are a total of n = 1377 potential X-ray sources.

• Some correspond to multiple optical sources.

• All observations are background contaminated.

• An independent background only observation is available.

Model:

Y Si ∼ Poisson

(κS

i (λSi + λB

j(i)))

,

Y Bj ∼ Poisson

(κB

j λBj

),

λSi ∼ Gamma(α, β).

We specify non-informative prior distributions for (λB1 , . . . , λB

J , α, β).

13

Detection Limits and P-values

How large must ySi be in order to conclude there is an X-ray source?

Procedure:

1. Suppose λSi = 0.

2. Use Y Bj to compute prior (gamma) distribution of λB

j(i).

3. Compute 95th percentile of marginal (negative-binomial) distribution of Y Si .

4. Set the detection limit, y!i to this percentile.

5. If ySi > y!

i conclude there is an X-ray source.

A similar strategy can be used to compute a p-value.

14

Illustrating Detection Limits

0 2 4 6 8 10

05

1015

Detection Limit

λB

y!

Here κS = κB = 1, and λB is treated as known.

15

Upper Limits

If there is no detection, how large might λSi be?

Procedure:

1. Suppose Y Si ∼ Poisson

(κS

i (λSi + λB

i )).

2. We detect a source if ySi > y!

i .

3. Let the upper limit, USi , be the smallest λS

i such that

Pr(Y Si > y!

i |λSi ) > 95%.

4. Note: Y Si | λS

i is the convolution of a Poisson and a negative binomial.

16

Illustrating Upper Limits

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

Power or Probability of Detection

pow

er=p

rob(

dete

ctio

n) 0.95

upper limit

λS

Here we assume λB = 2.

The 95% Upper Limit computed with a 95% Detection Limit.

17

The Effect of Choice of Detection Limit on Upper Limits

0 5 10 150.0

0.2

0.4

0.6

0.8

1.0

Probability of Detectionpo

wer

=pro

b(de

tect

ion)

0.1430.053

0.0170.005

0.001

Probability of Type I errors are printed on the curves.Detection Limits are 3, 4, 5, 6, and 7, respectively.

0 5 10 150.0

0.2

0.4

0.6

0.8

1.0

50% Upper Limits Computed with Various Detection Limits

pow

er=p

rob(

dete

ctio

n)

1.65 2.65 3.65 4.65 5.65

λS

λS

18

The Effect of Upper Limit Probability on Upper Limits

0 5 10 150.0

0.2

0.4

0.6

0.8

1.0

50% Upper Limits Computed with Various Detection Limitspo

wer

=pro

b(de

tect

ion)

1.65 2.65 3.65 4.65 5.65

0 5 10 150.0

0.2

0.4

0.6

0.8

1.0

95% Upper Limits Computed with Various Detection Limits

pow

er=p

rob(

dete

ctio

n)

5.75 7.15 8.53 9.95 11.15

λS

λS

19

Interval Estimates

For detected or non-detected sources,

we may wish to quantify uncertainty for λSi .

Three typical posterior distributions, computed using an MCMC sampler:

0.0 0.2 0.4 0.6 0.8

05

1015

Detection = FALSEUpper Limit = 1.03

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

Detection = TRUEUpper Limit = NA

1.5 2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

1.0

Detection = TRUEUpper Limit = NA

λSλSλS

Confidence Intervals Do Not Adequately Represent All Distributions.

20

Nuisance Parameters

Typical Strategy:

• Average (or perhaps optimize) over nuisance parameters.

• Check frequency properties via simulation studies.

• If necessary, adjust procedures and priors to improve frequency properties.

Sometimes (partially) ancillary statistics are available:

T = Y S + Y B ∼ Poisson(λS + 2λB

)= Poisson

((ρ + 1)λB

)

Y S | T ∼ Binomial(

T,λS + λB

λS + 2λB

)= Binomial

(T,

ρ

1 + ρ

)

With ρ = (λS + λB)/λB and κS = κB = 1.

We use the Binomial Dist’n to compute Detection Limits and Upper Limits.

1. Both procedures appear to detect exactly the same sources.

2. The Binomial Procedure produces more conservative Upper Limits (for ρ).• If the Detection Limit is T , the Upper Limit for ρ is +∞.

21

Concluding Recommendations

1. Plot the (profile) likelihood or (marginal) posterior distribution!

2. Don’t expect too much of confidence intervals.

3. Remember frequency confidence intervals & p-values use pre-data probability.

• Rejecting may not mean the alternative is more likely than the null.

4. Remember that subjective model choices can be just as influential as asubjective prior distribution.

5. Be willing to rethink subjective model choices based on prior information.

6. Be willing to use your data to access the likely values of nuisanceparameters. Use methods that take advantage of this information.

22

detections limits, u pp er limits, and c onÞdence i n t erv a ls in …hea- · 2006. 8. 21. ·...

Documents