estimating the significance of a signal in a multi ... · estimating the significance of a signal...
TRANSCRIPT
![Page 1: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/1.jpg)
Estimating the significance of a signal in a multi-dimensional search
Ofer Vitells, Eilam Gross
1
Ofer Vitells, Eilam Gross
TAUP 2011 , 5-9 September , Munich
O.Vitells & E. Gross, Astropart. Phys. (2011) doi:10.1016/j.astropartphys.2011.08.005
![Page 2: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/2.jpg)
Introduction
� Searching for a signal in some parameter space (mass, shape, location in the sky, etc.) involves a “look elsewhere effect” – the significance calculation needs to account for the possibility of the signal to appear anywhere within the range.
� Monte-Carlo simulation is a straight-forward way of estimating the p-value, but can be computationally very
20 2 0 4 0 6 0 8 0 1 0 0 1 2 0
0
1 0
2 0
3 0
4 0
5 0
Eve
nts
/ u
nit
mas
s
estimating the p-value, but can be computationally very expansive (requires repeating the entire search procedures many times with background-only simulations)
� The mathematical theory of random fields provides useful analytic results.
![Page 3: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/3.jpg)
Random fields
� Usually one defines the test statistic:
� For any fixed θ, q0(θ) follows (asymptotically) a χ2
distribution with one degree of freedom by Wilks’ theorem.
0
( 0)( ) 2 log
ˆ( , )q
µθ
µ θ=
= −L
L
0 : 0H µ =
1 : 0H µ >
µ=“signal strength”
Parameterization of the search space
3
� q0(θ) is a χ2 random field over the space of θ (a random variable indexed by a continuous parameter(s) ). we are interested in
� For which we want to know what is the p-value
0p-value=P(max[ ( )] )q uθ
θ ≥
0 0 0
( 0)ˆˆ ( ) 2 log max[ ( )]ˆˆ( , )
q q qθ
µθ θ
µ θ=
≡ = − =L
Lis the global
maximum point
θ̂
![Page 4: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/4.jpg)
� The set of points where the field has values larger then some number uis called the excursion set Au abovethe level u.
Excursion sets & The Euler characteristic
Excursion set{ : ( ) }uA q uθ θ= ∈ >M
4
Excursion set
φ=1 φ=0 φ=2
� The Euler characteristic is a topological property,in two dimensions it is the number of disconnected components minus the number of `holes’
![Page 5: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/5.jpg)
Expectation of the Euler characteristic
E[ ( )] ( )n
u d dA uϕ ρ=∑N
� For random fields defined over any parameter space (Riemannian manifold) in D dimensions, the expected Euler characteristic of the excursion set φ(Au) is givenby :
5
0
E[ ( )] ( )u d dd
A uϕ ρ=
=∑N
[R.J. Adler and J.E. Taylor, Random Fields and Geometry (2007),
Springer Monographs in Mathematics]
dN
- ρd are ‘universal’ functions (depend only on the level u and the type of distribution)
- The geometrical shape of the space and the covariance structure of the field are completely encoded into the coefficients (do not depend on the level u)
![Page 6: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/6.jpg)
Expectation of the Euler characteristic
E[ ( )] ( )n
u d dA uϕ ρ=∑N
� For random fields defined over any parameter space (Riemannian manifold) in D dimensions, the expected Euler characteristic of the excursion set φ(Au) is givenby :
6
0
E[ ( )] ( )u d dd
A uϕ ρ=
=∑N
[R.J. Adler and J.E. Taylor, Random Fields and Geometry (2007),
Springer Monographs in Mathematics]
20
( 1)/2 /21
( 2)/2 /22
( ) P( )
( )
( ) [ ( 1)]
...
s
s u
s u
u u
u u e
u u e u s
ρ χ
ρ
ρ
− −
− −
= >
=
= − −
e.g. for a χ2 field with s degrees of freedom:
![Page 7: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/7.jpg)
Expectation of the Euler characteristic
� Why is E[φ(Au)] interesting ?
Above high levels excursions are rare,
1 max[ ( )]( )
q uA
θϕ
>≈
7
( )0uA
otherwiseϕ
≈
0E[ (A )] P(max[ ( )] )u q uθ
ϕ θ≈ ≥
![Page 8: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/8.jpg)
2-D example: IceCube search for astrophysical neutrino point sources
Assume Gaussian distribution of
Unbinned likelihood:
( , ) ( ) (1 ) ( )s ss s s i b i
i
n nx n x x
N N = + −
∏�
L f f
IceCube looks for neutrino sources,
2-D Search over the sky (θ,φ)
8
J. Braun, J. Dumm, F. De Palma, C. Finley, A. Karle, and T. Montaruli,Astropart. Phys. 29, 299 (2008); [arXiv:0801.1604]
signal events
Detector resolution = 0.7O
Signal parameters can also include energy and time, not considered here
2
2
| |
22
1( | )
2
i ix x
s i sf x x e σ
πσ
−−
=
� �
� �( , )sx θ ϕ=
�
![Page 9: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/9.jpg)
2-D example: search for neutrino sources (IceCube)
Properly covering the whole sky requires a grid of ~10002 points
9
Significance map
0( , )q θ ϕ
IceCube simulated background data (1 year) 67,000 events,provided by Jim Braun & Teresa Montaruli
![Page 10: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/10.jpg)
0.1
0.15
0.2
0.25
10
Significance map
-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0( , )q θ ϕExcursion set (u=1)φ=95
![Page 11: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/11.jpg)
-0.22
-0.218
Calculation of the Euler characteristic
• Usually we have q(θ) calculated on a grid of points
• Calculation of the E.C. is straightforward:
11
-0.226
-0.224
-0.222
• φ = #points - #edges + #faces
• Generalizes to higher dimensions
φ = 18(points) –23(edges) + 7(faces)
= 2
![Page 12: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/12.jpg)
2-d example: search for neutrino sources (IceCube)
2 /21 2
1[ ( )] P( ) ( )
2u
uE A u u eϕ χ −= > + +N N
Estimate E[φ] at two levels, e.g. 0 and 1, and solve for and 1N 2N
For a chi2 field in 2 dimensions:
0
1
33.5 2
94.6 1.3
ϕ
ϕ
= ±
= ±
From 20 bkg. Simulations:
1 33 2
123 3
= ±
= ±
N
N
12-0.2 -0.1 0 0.1 0.2 0.3
-0.2
-0.1
0
0.1
0.2
0.3
u=1
2 123 3= ±N
-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
φ=35
u=0
-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
φ=95
u=1
![Page 13: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/13.jpg)
2-d example: search for neutrino sources (IceCube)
2 /21 2
1[ ( )] P( ) ( )
2u
uE A u u eϕ χ −= > + +N N 1
2
33 2
123 3
= ±
= ±
N
N
13
P-value 0q̂
e.g.: P(max q0>30) = (2.5 ± 0.4)x10-4 (estimated)
E.C. Formula : (2.28 ± 0.06)x10-4
~200,000 random background simulations
![Page 14: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/14.jpg)
2-d example: search for neutrino sources (IceCube)
2 /21 2
1[ ( )] P( ) ( )
2u
uE A u u eϕ χ −= > + +N N 1
2
33 2
123 3
= ±
= ±
N
N
Note this is NOT a simple “trial factor” correctioneffective trial factor increases with local significance
14
P-value 0q̂
e.g.: P(max q0>30) = (2.5 ± 0.4)x10-4 (estimated)
E.C. Formula : (2.28 ± 0.06)x10-4
~200,000 random background simulations
![Page 15: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/15.jpg)
Slicing
� Exploit the azimuthal angle symmetry to reduce computations:
( ) ( ) ( ) ( )A B A B A Bϕ ϕ ϕ ϕ= + −∪ ∩
Divide to N slices:
φ=0=1+1-2
N=18
15
Divide to N slices:
[ (slice ) (edge )] (0)i ii
ϕ ϕ ϕ ϕ= − +∑[ ] ( [ (slice)] [ (edge)]) (0)E N E Eϕ ϕ ϕ ϕ= × − +
edge
(0)ϕ1(slice) 7.8 0.35ϕ = ±
1(edge) 2.5 0.15ϕ = ±
/2
/2
(slice) ((6 0.5) (6.7 0.8) )
(edge) (4.4 0.2)
u
u
u e
e
ϕ
ϕ
−
−
= ± + ±
= ±
1
2
28 9
120 14
= ±
= ±
N
N
40 “slice” simulations
Consistent with full sky
simulation
![Page 16: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/16.jpg)
2-D exapmle #2: resonance search with unknown width
� Gaussian signal on exponential background
� Toy model : 0<m<100 , 2<σ<6
� Unbinned likelihood:
( ) ( )( | )s s i b s i
s bi s b
N f x N f xPoiss N N N
N N
+= × +
+∏L1 01
1 02
2( )x m−
16
( ) cxbf x ce−=
0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 01 0
0
10 20 30 40 50 60 70 80 902
2.5
3
3.5
4
4.5
5
5.5
60q̂
σ
m
2
2
( )
2
2
1( ; , )
2
x m
sf x m e σσπσ
−−
=
![Page 17: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/17.jpg)
2-D exapmle #2: resonance search with unknown width
10-3
10-2
10-1
100
P-value0q̂
Excellent approximation above the ~2σlevel
1710 20 30 40 50 60 70 80 90
2
2.5
3
3.5
4
4.5
5
5.5
6
10 20 30 40 50 60 70 80 902
2.5
3
3.5
4
4.5
5
5.5
6
u=1 u=0
5 10 15 20 25 3010
-6
10-5
10-4
2 /21 2
1[ ( )] P( ) ( )
2u
uE A u u eϕ χ −= > + +N N
1
2
4 0.2
0.7 0.3
= ±
= ±
N
N
0 4.5 0.2ϕ = ±1 3 0.16ϕ = ±
![Page 18: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/18.jpg)
More dimensions …
� Potential additional search dimensions:time, temporal/angular scale, energy, …
� When possible, slicing can be useful to reduce necessary computations.
18
� Slightly more complicated to calculate the E.C. on an N-D grid, but the formalism is well suited.
![Page 19: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/19.jpg)
Summary
� The Euler characteristic formula provides a practical way of estimating the look-elsewhere effect.
� Applicable in wide range of applications, such as astrophysical searches for neutrino sources or resonance search with unknown width, and in any
19
resonance search with unknown width, and in any number of search dimensions.
� The procedure for estimating the p-value is simple and reliable.
p-value ≈ 2 /21 2
1[ ( )] P( ) ( ) ...
2u
uE A u u eϕ χ −= > + + +N N
![Page 20: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/20.jpg)
20
Backup
![Page 21: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/21.jpg)
2-d example: search for neutrino sources (IceCube)
0.05
0.1
0.15
0.2
0.25
21
Significance map
-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25
-0.25
-0.2
-0.15
-0.1
-0.05
0
Excursion set (u=1)
0( , )q θ ϕ
![Page 22: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/22.jpg)
A small modification
� Usually we only look for ‘positive’ signals
0
( 0)2 log
ˆ( , )( )
0
q
µµ θθ=−
=
L
Lˆ 0µ >
ˆ 0µ ≤
q0(θ) is ‘half chi2’
[H. Chernoff, Ann. Math. Stat. 25, 573578 (1954)]
22
The p-value just get divided by 1/2
� Or equivalently consider as a gaussian field
( since by Wald’s theorem)
0 ˆ 0µ ≤
2
0
ˆ ( )( )q
µ θθ
σ =
[Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727]
µ̂
![Page 23: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/23.jpg)
The 1-dimensional caseFor a chi2 random field,the expected number ofupcrossings of a level u is given by: [Davies,1987]
/21[ ] u
uE N e−=N0 20 40 60 80 100 120
0
10
20
30
40
50
Eve
nts
/ un
it m
ass
23
To have the global maximum above a level u:
- Either have at least one upcrossing (Nu>0) or have q0>u at the origin (q0(0)>u) :
0 20 40 60 80 100 120
0 20 40 60 80 100 1200
5
m
q(m
)
0( )q m
u
0q̂
0 0ˆ( ) ( 0) ( (0) )uP q u P N P q u> ≤ > + >
0[ ] ( (0) )uE N P q u≤ + >
[ ] P( 0)u uE N N≥ >Note the inequality:
1 P(1) 2 P(2) ... P(1) P(2) ...⋅ + ⋅ + ≥ + +
When P( 1) P( 1)u uN N> =≪
[ ] P( 1) P( 0)u u uE N N N= >≃ ≃then
(large u)
Becomes an equality for large u
[R.B. Davies, Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74, 33–43 (1987)]
![Page 24: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/24.jpg)
The 1-dimensional case
The only unknown is ,which can be estimated from the average number of upcrossings at some
low reference level
1N
0 20 40 60 80 100 1200
10
20
30
40
50
Eve
nts
/ un
it m
ass /2
1[ ] uuE N e−=N
24
0 0P( ) [ ] P( (0) )uq u E N q u> ≤ + >/2 2
1 1
1( )
2ue P uχ−= + >N
low reference level
0
0
/21
uuN e≅N
0 20 40 60 80 100 120
0 20 40 60 80 100 1200
5
m
q(m
)
0( )q m
u
0q̂
The p-value can then be estimated by Davies’ formula
![Page 25: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f32454520df34553a4a91bc/html5/thumbnails/25.jpg)
1-D example: resonance search
0 2 0 4 0 6 0 8 0 1 0 0 1 2 00
1 0
2 0
3 0
4 0
5 0
Eve
nts
/ u
nit
ma
ss
The model is a gaussian signal (with unknown location m) on top of a continuous background (Rayleigh distribution)
( | ( ) )i i ii
Poiss n s m bµ β= +∏L
0.5( ) 4.34 0.11N u = = ±
25
In this example we find
[from 100 random background simualtions]
1 5.58 0.14= ±N
/2 21 1
1( )
2ue P uχ− + >N
P-value
0max ( )m
q m
[(E. Gross and O. Vitells, Eur. Phys. J. C, 70, 1-2, (2010) , arXiv:1005.1891]
Excellent approximation already from ~2σ(p-value≈5x10-2)