goodness of fit using bootstrap

21
Goodness of Fit using Bootstrap G. Jogesh Babu Center for Astrostatistics http: //astrostatistics . psu . edu

Upload: brian-guy

Post on 30-Dec-2015

57 views

Category:

Documents


2 download

DESCRIPTION

Goodness of Fit using Bootstrap. G. Jogesh Babu Center for Astrostatistics http://astrostatistics.psu.edu. Astrophysical Inference from astronomical data. Fitting astronomical data Non-linear regression Density (shape) estimation Parametric modeling Parameter estimation of assumed model - PowerPoint PPT Presentation

TRANSCRIPT

Page 2: Goodness of Fit using Bootstrap

Astrophysical Inference from astronomical data

Fitting astronomical data • Non-linear regression• Density (shape) estimation• Parametric modeling

– Parameter estimation of assumed model– Model selection to evaluate different models

• Nested (in quasar spectrum, should one add a broad absorption line BAL component to a power law continuum)

• Non-nested (is the quasar emission process a mixture of blackbodies or a power law?)

• Goodness of fit

Page 3: Goodness of Fit using Bootstrap

Chandra X-ray Observatory ACIS dataCOUP source # 410 in Orion Nebula with 468 photons

Fitting to binned data using 2 (XSPEC package)Thermal model with absorption, AV~1 mag

Page 4: Goodness of Fit using Bootstrap

Fitting to unbinned EDF Maximum likelihood (C-statistic)Thermal model with absorption

Page 5: Goodness of Fit using Bootstrap

Incorrect model family Power law model, absorption AV~1 mag

Question : Can a power law model be excluded with 99% confidence?

Page 6: Goodness of Fit using Bootstrap

Empirical Distribution Function

Page 7: Goodness of Fit using Bootstrap

K-S Confidence bandsF=Fn +/- Dn()

Page 8: Goodness of Fit using Bootstrap

Model fitting

Find most parsimonious `best’ fit to answer:• Is the underlying nature of an X-ray stellar

spectrum a non-thermal power law or a thermal gas with absorption?

• Are the fluctuations in the cosmic microwave background best fit by Big Bang models with dark energy or with quintessence?

• Are there interesting correlations among the properties of objects in any given class (e.g. the Fundamental Plane of elliptical galaxies), and what are the optimal analytical expressions of such correlations?

Page 9: Goodness of Fit using Bootstrap

Statistics Based on EDF

Kolmogrov-Smirnov: supx |Fn(x) - F(x)|,

supx (Fn(x) - F(x))+, supx (Fn(x) - F(x))-

Cramer - van Mises:

Anderson - Darling:

All of these statistics are distribution free

Nonparametric statistics.

But they are no longer distribution free if the parameters are estimated or the data is multivariate.

dF(x)F(x))(x)(F 2n −∫

dF(x) F(x))F(x)(1

F(x))(x)(F 2n∫ −

Page 10: Goodness of Fit using Bootstrap

KS Probabilities are invalid when the model parameters are estimated from the data. Some astronomers use them incorrectly.

(Lillifors 1964)

Page 11: Goodness of Fit using Bootstrap

Multivariate CaseWarning: K-S does not work in multidimensions

Example – Paul B. Simpson (1951)

F(x,y) = ax2 y + (1 – a) y2 x, 0 < x, y < 1

(X1, Y1) data from F, F1 EDF of (X1, Y1)

P(| F1(x,y) - F(x,y)| < 0.72, for all x, y) is > 0.065 if a = 0, (F(x,y) = y2 x) < 0.058 if a = 0.5, (F(x,y) = xy(x+y)/2)

Numerical Recipe’s treatment of a 2-dim KS test is mathematically invalid.

Page 12: Goodness of Fit using Bootstrap

Processes with estimated Parameters

{F(.; ): } - a family of distributions

X1, …, Xn sample from F

Kolmogorov-Smirnov, Cramer-von Mises etc.,

when is estimated from the data, are

Continuous functionals of the empirical process

Yn (x; n) = (Fn (x) – F(x; n))n

Page 13: Goodness of Fit using Bootstrap

In the Gaussian case,

and )s,X(è 2nn =

∑=

=n

1iiX

n

1X

∑=

−=n

1i

2i

2n )X(X

n

1s

Page 14: Goodness of Fit using Bootstrap

BootstrapGn is an estimator of F, based on X1, …, Xn

X1*, …, Xn

* i.i.d. from Gn

n*= n(X1

*, …, Xn*)

F(.; is Gaussian with (2)and , then

Parametric bootstrap if Gn =F(.; nX1

*, …, Xn* i.i.d. from F(.; n

Nonparametric bootstrap if Gn =Fn (EDF)

)s,X(è 2nn = )s,X(è *2

n*n

*n =

Page 15: Goodness of Fit using Bootstrap

Parametric Bootstrap

X1*, …, Xn

* sample generated from F(.; n).In Gaussian case .

Both supx |Fn (x) – F(x; n)| and

supx |Fn* (x) – F(x; n

*)| have the same limiting distribution

(In the XSPEC packages, the parametric bootstrap is command FAKEIT, which makes Monte Carlo simulation of specified spectral model)

)s,X(è *2n

*n

*n =

n

n

Page 16: Goodness of Fit using Bootstrap

Nonparametric Bootstrap

X1*, …, Xn

* i.i.d. from Fn.A bias correction

Bn(x) = Fn (x) – F(x; n) is needed.

supx |Fn (x) – F(x; n)| and

supx |Fn* (x) – F(x; n

*) - Bn (x) | have the same limiting distribution (XSPEC does not provide a nonparametric bootstrap capability)

n

n

Page 17: Goodness of Fit using Bootstrap

• Chi-Square type statistics – (Babu, 1984, Statistics with linear combinations of chi-squares as weak limit. Sankhya, Series A, 46, 85-93.)

• U-statistics – (Arcones and Giné, 1992, On the bootstrap of U and V statistics. Ann. of Statist., 20, 655–674.)

Page 18: Goodness of Fit using Bootstrap

Confidence limits under misspecification of model family

X1, …, Xn data from unknown H.H may or may not belong to the family {F(.; ): }.

H is closest to F(.; 0), in Kullback - Leibler information

h(x) log (h(x)/f(x; )) d(x) 0

h(x) |log (h(x)| d(x) <

h(x) log f(x; 0) d(x) = maxh(x) log f(x; ) d(x)

∫ ∫

Page 19: Goodness of Fit using Bootstrap

For any 0 < < 1,

P( supx |Fn (x) – F(x; n) – (H(x) – F(x; 0)) | <C*)

C* is the -th quantile of

supx |Fn* (x) – F(x; n

*) – (Fn (x) – F(x; n)) |

This provide an estimate of the distance between the true distribution and the family of distributions under consideration.

n

n

Page 20: Goodness of Fit using Bootstrap

References

• G. J. Babu and C. R. Rao (1993). Handbook of Statistics, Vol 9, Chapter 19.

• G. J. Babu and C. R. Rao (2003). Confidence limits to the distance of the true distribution from a misspecified family by bootstrap.   J. Statist. Plann. Inference 115, 471-478.

• G. J. Babu and C. R. Rao (2004). Goodness-of-fit tests when parameters are estimated.   Sankhya, Series A, 66 (2004) no. 1, 63-74.

Page 21: Goodness of Fit using Bootstrap

The End