maria grazia pia, infn genova statistical toolkit power of goodness-of-fit tests m.g. pia 1 b....

Maria Grazia Pia, INFN Genova

Statistical ToolkitPower of Goodness-of-Fit tests

B. Mascialino1, A. Pfeiffer2, M.G. PiaM.G. Pia11, A. Ribon2, P. Viarengo3

1INFN Genova, Italy 2CERN, Geneva, Switzerland

3IST – National Institute for Cancer Research, Genova, Italy

CHEP 2006Mumbai, 13-17 February 2006

Fluorescence spectrum of Icelandic Basalt8.3 keV beam

Counts

Energy (keV)


Some use casesRegression testing

– Throughout the software life-cycle

Online DAQ– Monitoring detector behaviour w.r.t. a reference

Simulation validation– Comparison with experimental data

Reconstruction– Comparison of reconstructed vs. expected distributions

Physics analysis– Comparisons of experimental distributions (ATLAS vs. CMS Higgs?)– Comparison with theoretical distributions (data vs. Standard Model)

The test statistics computation concerns the agreement

between the two samples’ empirical distribution functions

Historical background…Validation of Geant4 physics models through comparison of

simulation vs. experimental data or reference databases


G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo

“A Goodness-of-Fit Statistical Toolkit”IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063.

Releases are publicly downloadable from the web code, documentation etc.

Releases are also distributed with LCG Mathematical Libraries

Also ported to Java, distributed with JAS


Vision of the project

Rigorous software processsoftware process

Basic visionvision– General purpose tool

– Toolkit approach (choice open to users)

– Open source product

– Independent from specific analysis tools

– Easily usable in analysis and other tools

Build on a solid architecturearchitecture

Clearly define scopescope,

objectivesobjectives

Flexible, Flexible, extensible, extensible,

maintainablemaintainable system

Software quality quality


GoF algorithms (latest public release)

Algorithms for binned distributionsAlgorithms for binned distributions – Anderson-Darling test– Chi-squared test – Fisz-Cramer-von Mises test–

Algorithms for unbinned distributionsAlgorithms for unbinned distributions – Anderson-Darling test– Cramer-von Mises test– Goodman test (Kolmogorov-Smirnov test in chi-squared

approximation)

– Kolmogorov-Smirnov test– Kuiper test


Recent extensions: algorithms

It is the most complete software for the comparison of two distributions, even among commercial/professional statistics tools

– goal: provide all 2-sample GoF algorithms existing in statistics literature

Publication in preparation to describe the new algorithmsSoftware release: March 2006

Fisz-Cramer-von Mises test and Anderson-Darling test exact asymptotic distribution (earlier: critical values)

Tiku test Cramer-von Mises test in a chi-squared approximation

Improved tests

New tests Weighted Kolmogorov-Smirnov, weighted Cramer-von Misesvarious weighting functions available in literatureWatson test can be applied in case of cyclic observations, like the Kuiper test

In preparation Girone test


Simple user layerSimple user layer– Shields the user from the complexity of the underlying algorithms and design– Only deal with the user’s analysis objectsanalysis objects and choice of comparison algorithmcomparison algorithm

User Layer

First release: user layer for AIDA analysis objects– LCG Architecture Blueprint, Geant4 requirement

July 2005: added user layer for ROOT histograms– in response to user requirements

Other user layer implementations foreseen– easy to add– sound architecture decouples the mathematical component and the user’s representation of

analysis objects


Power of GoF testsDo weDo we really need really need such a wide such a wide collection of GoF tests? collection of GoF tests? Why?Why?

Which is the most appropriatemost appropriate test to compare two distributions?

How “goodgood” is a test at recognizing real equivalent distributions and rejecting fake ones?

Which test to

use?

No comprehensive study of the relative power of GoF tests exists in literature

– novel research in statisticsnovel research in statistics (not only in physics data analysis!)

SystematicSystematic study of allall existing GoF tests in progress

– made possible by the extensive collection of tests in the Statistical Toolkit


Method for the evaluation of power

N=1000Monte Carlo

replicas

Pseudoexperiment: a random drawing

of two samples from two parent distributions

For each test, the p-value computed by the GoF Toolkit derives from the analytical calculation of the asymptotic distribution, often depending on the samples sizes

Confidence Level = 0.05

Parent distribution 1

Sample 1n

Sample 2m

GoF testGoF test

Parent distribution 2

PowerPower = = # pseudoexperiments with p-value < (1-CL)

# pseudoexperiments


Parent distributions

1)(1 xf

Uniform)

2(

2

2

2

1)(

x

exf

Gaussian

||3

2

1)( xexf

Double exponential

24

1

11)(

xxf

Cauchy

xexf )(5

Exponential

Contaminated Normal Distribution 2

)1,1(5.0)4,1(5.0)(7 xf

)9,0(1.0)1,0(9.0)(6 xf

Contaminated Normal Distribution 1Also Breit-Wigner, other distributions being considered


Characterization of distributions

025.05.0

5.0975.0

xx

xxS

125.0875.0

025.0975.0

xx

xxT

Parent distributionParent distribution SS TTf1(x) Uniform 1 1.267

f2(x) Gaussian 1 1.704

f3(x) Double exponential 1 2.161

f4(x) Cauchy 1 5.263

f5(x) Exponential 4.486 1.883

f6(x) Contamined normal 1 1 1.991

f7(x) Contamined normal 2 1.769 1.693

SkewnessSkewness TailweightTailweight


General alternativeCompare different distributions

Parent1 ≠ Parent2

Unbinned distributions


The power increases as a function of the sample size

FLAT vs

EXPONENTIAL

Sample size

Em

piric

al p

ower

(%

)

Symmetricvs

skewed

Short tailedvs

Medium tailed

Sample size

Em

piric

al p

ower

(%

)

Symmetricvs

SkewedMedium tailed

vsMedium tailed

ADAD KK WW

KSCvMADKW

DOUBLE EXPONENTIALvs

CN1

Sample size

Em

piric

al p

ower

(%

)Medium tailed

vsMedium tailed

Symmetricvs

Symmetric

DOUBLE EXPONENTIAL vs

EXPONENTIAL

Sample size

Em

piric

al p

ower

(%

)

Medium tailedvs

Medium tailed

Symmetricvs

Simmetric

No clear winnerNo clear winner

CN1 vs

CN2


GAUSSIANvs

DOUBLE EXPONENTIAL

Samples size

Em

piric

al p

ower

(%

)

Very similar

distributions

Simmetricvs

SimmetricMedium tailed

vsLong tailed

The power increases as a function of the sample size

Sample size

Em

piric

al p

ower

(%

)

CAUCHYvs

EXPONENTIAL

Long tailedvs

Medium tailed

Symmetricvs

Asymmetric

ADAD

GAUSSIANvs

DOUBLE EXPONENTIAL

Sample size

Em

piric

al p

ower

(%

)

Symmetricvs

SymmetricMedium tailed

vsLong tailed

CvMCvM

GAUSSIAN vs

CN2

Sample size

Em

piric

al p

ower

(%

)Symmetric

vsskewed

Medium tailedvs

Medium tailed

CvMCvMADAD

KSCvMADKW

GAUSSIANvs

CN1

Sample size

Em

piric

al p

ower

(%

)

Symmetricvs

Symmetric

Medium tailedvs

Medium tailed

CvMCvM KSKS


The power varies as a function of the parent distributions’ characteristics

Samples size = 5

EXPONENTIALvs

OTHER DISTRIBUTIONS

Tailweight 2ND distribution

Em

piric

al p

ower

(%

) Samples size = 15


Em

piric

al p

ower

(%

)

EXPONENTIALvs

OTHER DISTRIBUTIONS

Em

piric

al p

ower

(%

)


Em

piric

al p

ower

(%

)


Samples size = 5 Samples size = 15

FLATvs

OTHER DISTRIBUTIONS

FLATvs

OTHER DISTRIBUTIONS

KSCvMADKW

Sample size = 5

EXPONENTIALvs

OTHER DISTRIBUTIONS


Em

piric

al p

ower

(%

) Sample size = 15


Em

piric

al p

ower

(%

)

EXPONENTIALvs

OTHER DISTRIBUTIONS

Em

piric

al p

ower

(%

)


Em

piric

al p

ower

(%

)


Distribution1asymmetric

Distribution1symmetric

FLATvs

OTHER DISTRIBUTIONS

FLATvs

OTHER DISTRIBUTIONS


The power varies as a function of parent distributions’ characteristics

Sample size = 5

EXPONENTIALvs

OTHER DISTRIBUTIONS


Em

piric

al p

ower

(%

) Sample size = 15


Em

piric

al p

ower

(%

)

EXPONENTIALvs

OTHER DISTRIBUTIONS

Em

piric

al p

ower

(%

)


Em

piric

al p

ower

(%

)


Sample size = 5 Sample size = 15

FLATvs

OTHER DISTRIBUTIONS

FLATvs

OTHER DISTRIBUTIONS

KSCvMADKW

Distribution1asymmetric

Distribution1symmetric


Comparative evaluation of tests

ShortShort

(T(T<1.5)<1.5)

MediumMedium

(1.5 < T < 2)(1.5 < T < 2)

LongLong

(T>2)(T>2)

SS~~11 KS KS – CVM CVM - AD

SS>1.5>1.5 KS - AD CVM-AD CVM - ADSke

wne

ssS

kew

ness

TailweightTailweightPreliminary


Location-scale alternativeSame distribution, shifted or

scaled

Parent1(x) = Parent2 ((x-θ)/τ)


Power increases as a function of sample size

KSCvMADKW

Em

piric

al p

ower

(%

)

Sample size

EXPONENTIAL

θ =0.5, τ = 0.5

Em

piric

al p

ower

(%

)

Sample size

θ =0.5, τ = 1.5

EXPONENTIAL

KK WW

CvMCvM

KSKS

Em

piric

al p

ower

(%

)

Sample size

EXPONENTIAL

θ =1.0, τ = 0.5

No clear No clear winnerwinner

Sample size

Em

piric

al p

ower

(%

)

EXPONENTIAL

θ =1.0, τ = 1.5

CvMCvM

KSKS


Power increases as a function of sample size

Sample size

Em

piric

al p

ower

(%

)

Em

piric

al p

ower

(%

)

Em

piric

al p

ower

(%

)

Em

piric

al p

ower

(%

)

Sample size

Sample size Sample size

KSCvMADKW

θ =0.5, τ = 0.5

θ =1.0, τ = 0.5

θ =0.5, τ = 1.5

θ =1.0, τ = 1.5

CN2

CN2

CN2

CN2

No clear winnerNo clear winner

WW KK


Power decreases as a function of tailweight


KSCvMADKW

Tailweight

Em

piric

al p

ower

(%

)

θ =1.0, τ = 0.5

N=10

TailweightEm

piric

al p

ower

(%

)

θ =1.0, τ = 1.5

N=10

TailweightEm

piric

al p

ower

(%

) θ =0.5, τ = 0.5

N=10

Em

piric

al p

ower

(%

)

Tailweight

θ =0.5, τ = 1.5

N=10



Binned distributions

General alternative Compare different distributions

Parent1 ≠ Parent2


Chi-squared test: POWER %

Norm DoubleExp Cauchy CN1

DoubleExp 20

Cauchy 63 35

CN1 21 31 16

CN2 100 99 55 86

Samples size = 500Number of bins = 20


Preliminary results

No clear winner clear winner for all the considered distributions in general– the performance of a test depends on its intrinsic features as well as on

the features of the distributions to be compared

Practical recommendations

1) first classify the type of the distributions in terms of skewness and tailweight

2) choose the most appropriate test given the type of distributions

Systematic study of the power in progress– for both binned and unbinned distributions

Topic still subject to research activity in the domain of statistics

Publication in preparation


Surprise…

KSCvMADKW

KSCvMADKW

KSCvMADKW

FlatGaussian

Exponential

Inefficiency

Inefficiency

Inefficiency

General alternative, same distributions

CL = 95%, expect 5% inefficiency

Anderson-DarlingAnderson-Darling exhibits an unexpected inefficiencyinefficiency at low numerosity

Not documented anywhere in literature!

Limitation of applicability?


Outlook

1-sample GoF tests (comparison w.r.t. a function)

Comparison of two/multi-dimensional distributions

Systematic study of the power of GoF tests

Goal to provide an extensive set of algorithms so far published in statistics literature, with a critical evaluation of their relative strengths and applicability

Treatment of errors, filtering

New release coming soon

New papers in preparation

Other components beyond GoF? Suggestions are welcome…


ConclusionsA novel, complete software toolkit for statistical analysis is being developed

– rich set of algorithms– sound architectural design– rigorous software process

A systematic study of the power of GoF tests is in progress– unexplored area of research

Application in various domains– Geant4, HEP, space science, medicine…

Feedback and suggestions are very much appreciated

The project is open to developers interested in statistical methods


IEEE Transactions on Nuclear ScienceIEEE Transactions on Nuclear Sciencehttp://ieeexplore.ieee.org/xpl/RecentIssue.jsp?puNumber=23

Prime journal on technology in particle/nuclear physics

Review process reorganized about one year ago Associate Editor dedicated to computing papers

Various papers associated to CHEP 2004 published on IEEE TNS

Papers associated to CHEP 2006 are welcomePapers associated to CHEP 2006 are welcome

Manuscript submission: http://tns-ieee.manuscriptcentral.com/Papers submitted for publication will be subject to the regular review process

Publications on refereed journals are beneficial not only to authors, but to the whole community of computing-oriented physicists

Our “hardware colleagues” have better established publication habits…

Further info: [email protected]

maria grazia pia, infn genova statistical toolkit power of goodness-of-fit tests m.g. pia 1 b....

Documents

appropriate test

infn genova slide

mises test algorithms

maria grazia pia

gof test parent distribution

test statistics computation

preparation girone test

infn genova power of