detections limits, u pp er limits, and c onÞdence i n t erv a ls in …hea- · 2006. 8. 21. ·...
TRANSCRIPT
Detections Limits, Upper Limits,
and Confidence Intervals
in High-Energy Astrophysics
David van Dyk
Department of StatisticsUniversity of California, Irvine
Joint work withVinay Kashyap, The CHAS Collaboration, and SAMSI SaFeDe Working Group
1
Outline of Presentation
This talk has two parts.
1. Soap Box
• What do short Confidence Intervals Mean?
• Is there an all-purpose statistical solution?
• Why compute Confidence Intervals at all?
2. Illustrations from High-Energy Astro-statistics
• Detection and Detection Limits
• Upper Limits for undetected sources
• Quantifying uncertainty for detected sources
• Nuisance parameters
2
What is a Frequency Confidence Interval?
Background contaminated Poisson
N ∼ Poisson(µ + b),
with b = 2.88 and µ = 1.25.
• Confidence Interval:
{µ : N ∈ I(µ)},
where Pr(N ∈ I(µ)|µ) ≥ 95%.
• Values of µ with propenstity togenerate observed N .
Sampling Distribution of 95% CI
0 5 10 15
0.0
0.2
0.4
0.6
0.8
1.0
N = 0
N = 1
N = 2
N = 3
N = 4
N = 5
N = 6
N = 7N = 8 N > 8
confidence interval for µ
cum
ulat
ive
prob
abili
ty
The CI gives values of µ that are plausible given the observed data.
3
Short or Empty Frequency Confidence Intervals
What do they mean?
There are few values of µ that are plausible given the observed data.
What they do not mean?
The experimental uncertainty is small. (Std Error or Risk of µ̂??)
What if CI are repeatedly short or empty?
Short intervals should be an uncommon occurrence. If they are not
the statistical model is likely wrong, regardless of the strength of the
subjective prior belief in the model.
4
Pre-Data and Post-Data Probability
• A Frequency Confidence Interval has (at least) a 95% chance of covering thetrue value of µ.
• What is the chance that ∅ contains µ?
There is a 95% chance that Confidence Intervals
computed in this way contain the true value of µ.
• Frequency-based probability says nothing about a particular interval.
• Bayesian methods are better suited to quantify this type of probability.
• Precise probability statements may not be the most relevant probabilitystatements.
Our intuition leads us to condition on the observed data.
5
Leaning on the Science Rather than on the Statistics
Desirable Properties of Confidence Intervals Include (Mandelkern, 2002):
1. Based on well-define principles that are neither arbitrary nor subjective
2. Do not depend on prior or outside knowledge about the parameter
3. Equivariant
4. Convey experimental uncertainty
5. Correspond to precise probability statements
But....
1. Mightn’t there be physical reasons to prefer a parameterization over another?
2. Why so much concern over using outside information regarding the values ofthe parameter but none over the form of the statistical model?
Is it easier to make a final decision on the choice of CI than on the
choice of parameterization?
6
Say What You Mean, Mean What You Say
• Confidence Intervals are not designed to represent experimental uncertainty.
• Rather than trying to use Confidence Intervals for this purpose, an estimatorthat is designed to represent experimental uncertainty should be used.
• For example, we might estimate the risk of an estimator.
Risk(µ, µ̂) = Eµ
[Loss(µ, µ̂)
]=
∞∑
N=0
Loss(µ, µ̂(N))f(N |µ)
7
Why Use Confidence Intervals?
• The Guassian distribution canbe fully represented by two pa-rameters, the mean and the stan-dard deviation.
• No information is lost if we re-port a simple central 95% prob-ability interval.
• The CLT ensures that for largesamples many likelihoods andposterior distributions are ap-proximately Gaussian.
mean
SD
• The bell-shape of the Gaussian dis-tribution gives a simple interpreta-tion to such intervals.
• Analogous reasoning yields a largesample interpretation for FrequencyConfidence Intervals.
The 95% Confidence Interval:
A story of Gaussian sampling and posterior distributions.
8
But what if....
• How do we summarize this distribution?
• Why do we want to? What is the benefit?
• With more complex distributions, simple summaries are not available.
Solution: Do Not Summarize!!
9
Multiple Modes
0 1 2 3 4 5 6
02
46
812
Energy (keV)
Post
erio
r pro
babi
lity
dens
ity fu
nctio
n obs!id 47
0 1 2 3 4 5 6
02
46
812
Energy (keV)Po
ster
ior p
roba
bilit
yde
nsity
func
tion obs!id 62
0 1 2 3 4 5 6
02
46
812
Energy (keV)
Post
erio
r pro
babi
lity
dens
ity fu
nctio
n obs!id 69
0 1 2 3 4 5 6
02
46
812
Energy (keV)
Post
erio
r pro
babi
lity
dens
ity fu
nctio
n obs!id 70
0 1 2 3 4 5 6
02
46
812
Energy (keV)
Post
erio
r pro
babi
lity
dens
ity fu
nctio
n obs!id 71
0 1 2 3 4 5 6
02
46
812
Energy (keV)Po
ster
ior p
roba
bilit
yde
nsity
func
tion obs!id 1269
An Interval is not a helpful summary!
10
A False Sense of Confidence
versus [2.03, 7.87]
• Confidence Intervals appear to give a definite range of plausible values.
• Plots seem wishy-washy.
• But the likelihood (or posterior distribution) contains full information.
Confidence Intervals Give a False Sense of Confidence.
11
Outline of Presentation
This talk has two parts.
1. Soap Box
• What do short Confidence Intervals Mean?
• Is there an all-purpose statistical solution?
• Why compute Confidence Intervals at all?
2. Illustrations from High-Energy Astro-statistics
• Detection and Detection Limits
• Upper Limits for undetected sources
• Quantifying uncertainty for detected sources
• Nuisance Parameters
12
A Survey
• We observe a field of optical sources with the Chandra X-ray Observatory.
• There are a total of n = 1377 potential X-ray sources.
• Some correspond to multiple optical sources.
• All observations are background contaminated.
• An independent background only observation is available.
Model:
Y Si ∼ Poisson
(κS
i (λSi + λB
j(i)))
,
Y Bj ∼ Poisson
(κB
j λBj
),
λSi ∼ Gamma(α, β).
We specify non-informative prior distributions for (λB1 , . . . , λB
J , α, β).
13
Detection Limits and P-values
How large must ySi be in order to conclude there is an X-ray source?
Procedure:
1. Suppose λSi = 0.
2. Use Y Bj to compute prior (gamma) distribution of λB
j(i).
3. Compute 95th percentile of marginal (negative-binomial) distribution of Y Si .
4. Set the detection limit, y!i to this percentile.
5. If ySi > y!
i conclude there is an X-ray source.
A similar strategy can be used to compute a p-value.
14
Illustrating Detection Limits
0 2 4 6 8 10
05
1015
Detection Limit
λB
y!
Here κS = κB = 1, and λB is treated as known.
15
Upper Limits
If there is no detection, how large might λSi be?
Procedure:
1. Suppose Y Si ∼ Poisson
(κS
i (λSi + λB
i )).
2. We detect a source if ySi > y!
i .
3. Let the upper limit, USi , be the smallest λS
i such that
Pr(Y Si > y!
i |λSi ) > 95%.
4. Note: Y Si | λS
i is the convolution of a Poisson and a negative binomial.
16
Illustrating Upper Limits
0 2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
Power or Probability of Detection
pow
er=p
rob(
dete
ctio
n) 0.95
upper limit
λS
Here we assume λB = 2.
The 95% Upper Limit computed with a 95% Detection Limit.
17
The Effect of Choice of Detection Limit on Upper Limits
0 5 10 150.0
0.2
0.4
0.6
0.8
1.0
Probability of Detectionpo
wer
=pro
b(de
tect
ion)
0.1430.053
0.0170.005
0.001
Probability of Type I errors are printed on the curves.Detection Limits are 3, 4, 5, 6, and 7, respectively.
0 5 10 150.0
0.2
0.4
0.6
0.8
1.0
50% Upper Limits Computed with Various Detection Limits
pow
er=p
rob(
dete
ctio
n)
1.65 2.65 3.65 4.65 5.65
λS
λS
18
The Effect of Upper Limit Probability on Upper Limits
0 5 10 150.0
0.2
0.4
0.6
0.8
1.0
50% Upper Limits Computed with Various Detection Limitspo
wer
=pro
b(de
tect
ion)
1.65 2.65 3.65 4.65 5.65
0 5 10 150.0
0.2
0.4
0.6
0.8
1.0
95% Upper Limits Computed with Various Detection Limits
pow
er=p
rob(
dete
ctio
n)
5.75 7.15 8.53 9.95 11.15
λS
λS
19
Interval Estimates
For detected or non-detected sources,
we may wish to quantify uncertainty for λSi .
Three typical posterior distributions, computed using an MCMC sampler:
0.0 0.2 0.4 0.6 0.8
05
1015
Detection = FALSEUpper Limit = 1.03
0 2 4 6 8
0.0
0.1
0.2
0.3
0.4
Detection = TRUEUpper Limit = NA
1.5 2.0 2.5 3.0 3.5 4.0 4.5
0.0
0.2
0.4
0.6
0.8
1.0
Detection = TRUEUpper Limit = NA
λSλSλS
Confidence Intervals Do Not Adequately Represent All Distributions.
20
Nuisance Parameters
Typical Strategy:
• Average (or perhaps optimize) over nuisance parameters.
• Check frequency properties via simulation studies.
• If necessary, adjust procedures and priors to improve frequency properties.
Sometimes (partially) ancillary statistics are available:
T = Y S + Y B ∼ Poisson(λS + 2λB
)= Poisson
((ρ + 1)λB
)
Y S | T ∼ Binomial(
T,λS + λB
λS + 2λB
)= Binomial
(T,
ρ
1 + ρ
)
With ρ = (λS + λB)/λB and κS = κB = 1.
We use the Binomial Dist’n to compute Detection Limits and Upper Limits.
1. Both procedures appear to detect exactly the same sources.
2. The Binomial Procedure produces more conservative Upper Limits (for ρ).• If the Detection Limit is T , the Upper Limit for ρ is +∞.
21
Concluding Recommendations
1. Plot the (profile) likelihood or (marginal) posterior distribution!
2. Don’t expect too much of confidence intervals.
3. Remember frequency confidence intervals & p-values use pre-data probability.
• Rejecting may not mean the alternative is more likely than the null.
4. Remember that subjective model choices can be just as influential as asubjective prior distribution.
5. Be willing to rethink subjective model choices based on prior information.
6. Be willing to use your data to access the likely values of nuisanceparameters. Use methods that take advantage of this information.
22