statistical methods in cosmology andré tilquin (cppm) [email protected]

Statistical methods in cosmology

André Tilquin (CPPM) [email protected]

)1(33

23

0

1010

101 )1()(

)1()()1()(;)'(

'

)sinh(1

0

)1(0

)sin(1

0

1

.log5),,,,,(

wwzw

kXm

z

k

k

Lk

Lk

k

k

Lk

Xmk

LsXms

zezf

zzfzzHzH

dzJ

Jz

D

JzD

Jz

D

Dcmwwmzm

2

m,

,w 0,w 1

Outlook• General problem• The frequentist statistic

– Likelihood, log-likelihood and 2 – Fisher analysis and limitation

• The Bayesian statistic– The Bayes theorem– Example and interpretation.

• Summary

How to find the best curve ?

1. We look for the closest curve with respect to the data points:

But the worst measured points should have less weight.

General problem(1)•Let assumes we have N Supernovae at different redshift

and a given model:Nimii izm ,1),,( ),( zmm kth

2

2min

),(

im

iik

i

mzm

22min )),((

i iik mzmD

2. The problem now is:

• Find the k value such that 2 is minimum

• Computed the errors on k starting from the errors on mi

General problem(2)

0

k

),,( kim mfik

2

2 ),(

im

iik

i

mzm

Statistic is necessary

Freqentist statistic

Definition: Probability is interpreted as the frequency of the outcome of a repeatable experiment.

Central limit theorem

If you repeat N times a measurement, and for N, the measurement distribution (pdf) will be a Gaussian

distribution around the mean value with a half width equal to the error of your experiment.

We have to minimize with respect to k

Maximum likelihood(=0).What is the best curve ?

Answer : The most probable curve !

The probability of the theoretical curve is the product of each individual point to be around the curve:

n

iim

ik

n

ii

im

izkmim

ezmmpL11

22

2),(

2

1)),(,(

We have to maximze L with respect to k 0

k

L

Because it is simpler to work with sum:

n

im

ikin

im

i

i

zmmLnLLn

1 2

2

1

),(

2

1)2()(

0

k

Some 2 property

Definition in matrix form:

Second derivative:

22

22

20

2

2

20 2)(

2)(

xxx x

xx

x

xx

jiij mm

V22

1

2

1

)()( then. Si 01

0

1

mmVmm

m

m

m

n

•The first 2 derivative gives le minimum

•The second 2 derivative gives the weight matrix = inverse of the error matrix independent of the measured data points.

Probability and 2 : By definition 2

)(2/1)()( 2

1

022

0

eLLLLnpLLn

]1

[)0(2

1

iiiV

p

kmin,

Computing errorsWhen the 2 is defined on measured variables (i.e magnitude), how to compute the errors on physical parameters, k ?

)(()( 1

thth mmVmm

We perform a Tailor expansion of the 2 around minimum:

min22

min2

minmin2min

minmin 2

1),(),( kk

lk

T

kkk

T

kkpn

kin

ki

kk

mm

=0

If the transformation m(k) is linear, then:

The second derivative of the 2 is a symetric positive matrix

The error on k are Gaussian 010kkkk U

lkklU

221

2

1

Computing errors(2)

n

im

iki

i

zmm1 2

22 ),(

Simple case:

If m(k, zi) is linear 01

iiV Jacobi

Error on physical parameter are deduced by a simple projection on the k space parameter (linear approximation) Fisher analysis

kik

ik

kik

ik zmV

zmU

,

1

,

1 ),(),(

Independant of the

measured points

12

1 )(),(

U

mVzmm

lk

kiki

If m(k, zi) is not linear: Fisher is a good approximation if:

n

ilk

k

m

iki

l

ik

m

n

ik

ik

lk

n

ik

k

m

iki

k

mzmmzmzm

mzmm

ii

i

1

2

221

22

1 2

2

)(),(),(1),(

2

1

)(),(2

Assume we know errors on m et , (no correlation). We would like to

compute errors on S=m+ et D=m-:

•We construct the covariance matrix•We construct the Jacobian:

•We project:

•

We inverse V:

Exemple: variables change

2

21

/10

0/1

mU

2

2DS

DSm

2222

2222

2211

4

1

mm

mm

m

JUJV

11

112/1

//

//

DD

SSJ

m

m

2

2

2222

2222

DDS

DSS

mm

mmV

22

2222

)()(

m

m

mmm

;

External constraint or prior

•Problem: Using SN we would like to measured ( m,) knowing that from the CMB we have :T=m+=1.010.02.

•This measurement is independent from the SN measurement. So we can add it to 2. 2

21 2

22 ),,(

Ti

oTmn

im

imi zmm

)1(

11

2

2

2

1

),,(

...

),,(

),(

/1000

0/10..

00..0

0..0/11

n

oTm

nmn

m

ikim

m

zmm

zmm

zmmU

T

n

et

All the previous equations are still correct by replacing:

),(

)(

),(

mkk

m

k

ik zm

JAnd the Jacobi:

Minimisation of 2 and bias estimate

n

im

iki

i

zmm1 2

22 ),(

n

ik

ik

m

iki

k

zmzmm

i

1 2

2

0),(),(

2

okk

k

okk

okk

k

okk

ok

ok

2

22222

2

1)()(

We Tailor expand the 2 around ko:

ok

ok

k

okk

kk

k

2

2

222

2

1

2

10

)(

JUJVimk )()( 101 )(

1 ),()( niokim zmmUJ

i

)(111

)( ),()( nioki

Tp

okk zmmUJJUJ

We apply the minimum condition in k

We get the first order iterative équation:

If theoritical model is linear, this equation is exact (no iteration)

•If m(k) is linear in k then:

•If errors on mi are Gaussian then errors on k will be

• 2(k) is exactly a quadratic form

• The covariance matrix is positive and symetric

•Fisher analysis is rigorously exact.

Non-linéarity

)(()( 01

021 mmVmm

01022 kkkk U

kik

ik

kik

ik zmV

zmU

,

1

,

1 ),(),(

•On the contrary only 12 is rigorously exact: Fisher matrix is a linear

approximation.

The only valid properties are:Best fit is given by =>

The « s » sigma error is given by solving: 12 = 2

min +s2

021

k

ok

ok

kik

oii

2

122

Non linearity: exampleEvolution of 1

2- min2 : SNAP simlation, flatness at 1 %

Fisher analysis

2=min2+1

-

Secondary minimum

+asymetric error

38.013.00 00.1

wRem:This secondary minimum is highly due to non linearity.

26.025.00 88.0

w

Non GaussianityWhen errors on observables are not Gaussian, only the minimum

iterative equation can be use. So go back to the definition: “Probability is interpreted as the frequency of the outcome of a repeatable experiment” and do simulation: Gedanken experiments

a) Determine the cosmological model {k0

} by looking for the best fit parameters on data. This set of parameters is assumed to be the true cosmology.

b) Compute the expected observables and randomize them inside the experimental errors, taking into account non Gaussianity. Do the same thing with prior.

c) For each “virtual” experiment, compute the new minimum to get a new set of cosmological parameters k

i

d) Simulate as many virtual experiments as you can

e) The distributions of these “best fit” value {ki} give the errors:

• The error matrix is given by second order moments:

Ui,j={<ij> - <i> <j>} positive define• The error on errors scale as σ(σ) ~σ/√2N

Bayesian statisticor

The complexity of interpretation

1702-1761 (paper only published in 1764)

Bayesian theorem.

Likelihood marginal

prior*Likelihood Posterior

Prior to measurementmeasurementPosterior to measurement

Normalization factor=Evidence

Normalization factor is the sum over all possible posteriors to ensure unitarily of probability.

j j

i

MEp

MEp

)(*)Ep(M

)(*)Ep(MM)p(E

j

ii

Where > means after and < means before.

Example• Question: Suppose you have been tested

positive for a disease; what is the probability that you actually have the disease?

%5)(TL)(TL

%95)(TL)L(T

DD

DD-Efficiency of the test:

-Disease is rare:99%)p(

%1)p(

TD

TD

What is the Bayesian probability?

%1699.0*05.001.0*95.0

01.0*95.0)(

)(*)(L)(*)(L

)(*)(L)(

TDp

TDpDTTDpDT

TDpDTTDp

Why a so small Bayesian probability (16%) compare to Likelihood probability of 95%? Which method is wrong?

Intuitive argument.• Over 100 people, the doctor expect 1 people has a disease and 99

have no disease. If doctor makes test to all people:– 1 has a disease and will probably have a positive test– 5 will have a positive test while they have no disease 6 positive tests for only 1 true disease So the probability for a patient to have a disease when the test is positive is

1/6~16% =>Likelihood is wrong?

• In the previous argument doctor used the whole population to compute its probability: – 1% disease and 99% not disease before the test– He have assumed that the patient is a random guy. The patient state before

the measurement is a superposition of 2 states• /patient> = 0.01*/desease>+0.99*/healthy>

• But what about yourself before the test?• /you> = /disease> or /healthy> but not both state in the same time

/you> /patient>

=>Bayesian is wrong?

Which statistic is correct? Both!• But they do not answer to the same question!:

– Frequentist: If “my” test is positive what is the probability for me to have a disease? 95%

– Bayesian: If “one of the” patient has a positive test what is the probability for this patient to have a disease? 16%

• Different questions give different answers!

Conclusion: In Bayesian statistic, the most important is the prior because it can change the question!

It’s the reason why statistician like Bayesian statistic, because just playing with prior can solve a lot of different problems.

On the contrary, in Scientific works, we should care about prior and interpretation of the Bayesian probability.

Summary

• Both statistics are used in cosmology and give similar results if no or week priors are used.

• The frequentist statistic is very simple to use for gaussian errors and rather linear model. Errors can easily be computed using Fisher analysis.

• Bayesian statistic might be the only method to solve very complex problem. But warning about probability interpretation!

• For complex problem, only simulation can be used and are lot of computing time.

• When using priors in both cases a careful analysis of results should be done

References

• http://pdg.lbl.gov/2009/reviews/rpp2009-rev-statistics.pdf

• http://www.inference.phy.cam.ac.uk/mackay/itila/• http://ipsur.r-forge.r-project.org/• http://www.nu.to.infn.it/Statistics/• http://en.wikipedia.org/wiki/F-distribution• http://www.danielsoper.com/statcalc/calc07.aspx• If you have any question or problem, send me an

e-mail: [email protected]

Kosmoshow: cosmology in one click

Click icon and then in any

place for help

Manage files and load

predefine survey

Define different dark energy

parameterization

Choose different probes

Your cosmology used for simulation

Predefines survey:Data or simulation

Main table: SN definitionFitting options

Parameters to be fitted

Prior definition.

Actions:Kosmosfit: fitting and error computing

Server : 166.111.26.237• Connect to the server:

– User: student– Pwd: thcaWorkshop

• Create a directory with your name: – mkdir tilquin

• source /cosmosoft/environment/idl-env.sh• Go to your work directory• cp /home/tilquin/kosmoshow/*.* . • idl

kosmoshowsc

Or download from http://marwww.in2p3.fr/~tilquin/

Non-linearity: computing errors

If m(k) is not linear, errors on k is not gaussian. Fisher analysis is no more correct. If one use it, results should be verify a posteriori.

To estimated the errors we should come back to the first definition of the 1

2 and solve the equation 12 = 2

min +1

),((),(),( 01

021

MMM mmVmm

If we want to estimate () what about M ? How to take care of correlation ? How to marginalize over M ?

MMM dp

1

0

21

21

21 ),((),()(

MM ),,(min)( 21

21

•Average answer (simulation)

•Most probable answer (data)

It can be shown than both methods are equivalent for simulation if simulated point are not randomized. mmes = mth

Bayesian evidence

• Bayes forecasts• method:• define experiment configuration and models• simulate data D for all fiducial parameters• compute evidence (using the data from b)• plot evidence ratio B01 = E(M0)/E(M1)• limits: plot contours of iso-evidence ratio• ln(B01) = 0 (equal probability)• ln(B01) = -2.5 (1:12 ~ substantial)• ln(B01) = -5 (1:150 ~ strong)• Computationally intensive: need to calc. 100s of• evidences

Graphical interpretation (contour)

m

)0(m

)0(

m

12min

2

39%

m

m

)0()0(m

12min

2

39%

m

m

)0()0(m

22

m

22

m

•The equation: define an iso-probability ellipse.22min

2 s

2

21

/10

0/1

mU

1

2

2)0(

2

2)0(2

m

mm

JUJV 11

1)(

)(

)(

)()0()0(

)0()0(1

)0()0(

)0()0(2

mm

mm

mm

mm V

222)2(

YX

YXtg

-/4

68%

Systematic errorsDéfinition: Systematic error is all that is not statistic.

Statistic: If we repeat « n » the measurement of the quantity Q with a statistical error Q, the average value <Q> tends to the true value Q0 with an error Q/n.

Systematic:Whatever is the number of experiments, <Q> will never tends to Q0 better than the systematic error S.

How to deal with:

If systematic effect is measurable, we correct it, by calculating <Q-Q> with the error Q’

2= Q2+ Q

2

If not, we add the error matrices: V’ = Vstat+Vsyst and we use the general formalism.

Challenge:The systematic error should be less than the statistical error. If not, just stop the experiment, because they won !!!!

Error on the z parameterSNAP mesure mi et zi with errors m et z. Redshift is used as a paremeter on the theoritical model and its error is not on the 2.

n

im

imi

i

zmm1 2

22 ),,(

But the error on z leads to an error on m(m,,zi)

iz

iz

kzm z

zm

),()(

Thus, the error on the difference mi-mth is:

2

2)( ),(

i

i

ii zz

km

Tm z

zm

m

z

statistical methods in cosmology andré tilquin (cppm) [email protected]

Documents