maria grazia pia, infn genova a toolkit for statistical data analysis m.g. pia s. donadio, f....

38
Maria Grazia Pia, INFN Genova A Toolkit for A Toolkit for Statistical Data Statistical Data Analysis Analysis S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer, M.G. Pia M.G. Pia, A. Ribon, P. Viarengo http://www.ge.infn.it/geant4/analysis/HEPstatistics LCG Application Area Meeting CERN, 5 May 2004

Upload: audrey-mckenzie

Post on 22-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

A Toolkit for A Toolkit for Statistical Data AnalysisStatistical Data AnalysisS. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino,

A. Pfeiffer, M.G. PiaM.G. Pia, A. Ribon, P. Viarengo

http://www.ge.infn.it/geant4/analysis/HEPstatistics

LCG Application Area MeetingCERN, 5 May 2004

Page 2: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

History and backgroundHistory and background

Page 3: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

The motivation from Geant4The motivation from Geant4Validation of Geant4 physics models through comparison of

simulation vs experimental data or reference databases

Fluorescence spectrum from Icelandic basalt (Mars-like rock): experimental data and simulation

ESA Bepi Colombo mission to Mercury Test beam at Bessy

Photon attenuation coefficient, Al

Geant4 Standard

Geant4 LowE

NIST

Electromagnetic models in Geant4 w.r.t. NIST reference

Page 4: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Historical introduction to EDF testsHistorical introduction to EDF tests

In 1933 Kolmogorov published a short, but landmark paper on the Italian Giornale dell’Istituto degli Attuari. He formally defined the empirical distribution functionempirical distribution function (EDF) and then enquired how close this would be to the true distributionenquired how close this would be to the true distribution F(x), when this is continuous.

It must be noticed that Kolmogorov himself regarded his paper as the solution of an interesting probability probleminteresting probability problem, following the general interest of the time, rather than a paper on statistical methodologystatistical methodology..

After Kolmogorov article, over a period of about 10 years, the foundationsfoundations were laid by a number of distinguished mathematicians of methods of testing fit to a distribution based on the EDF (Smirnov, Cramer, Von Mises, Anderson, DarlingSmirnov, Cramer, Von Mises, Anderson, Darling, …).

The ideas in this paper have formed a platform for vast literature, both of interesting and important probability problems, and also concerning methods of using the Kolmogorov statistics for testing fit to a distribution. The literature production continues continues with great strength todaywith great strength today showing no sign to decrease.

Page 5: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Typical use cases in HEPTypical use cases in HEP

Regression testing– Throughout the software life-cycle

Online DAQ– Monitoring detector behaviour w.r.t. a reference

Simulation validation– Comparison with experimental data

Reconstruction– Comparison of reconstructed vs. expected distributions

Physics analysis– Comparisons of experimental distributions (ATLAS vs. CMS Higgs?)– Comparison with theoretical distributions (data vs. Standard Model)

Page 6: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Software toolsSoftware tools

Commercial products used by “professional” statisticians– SPSS, NCSS...

In HEP:

A lot of activity:– workshops/conferences (CERN, Durham, SLAC etc.)– books (F. James et al., L. Lyons, R. Barlow etc.)– sophisticated statistical algorithms applied in various data analyses

...but, in spite of the relevant role played by statistics in HEP, very limited availability of software tools for statistics in our field

– and in open-source software in general

Page 7: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Let’s do it ourselves...

Provide tools for theProvide tools for the statistical comparisonstatistical comparison of distributionsof distributions

Create a hub Create a hub toto aggregate expertiseaggregate expertise andand collaborative contributionscollaborative contributions

from scientists interested in statistical methodsfrom scientists interested in statistical methods

A project to develop an open-source

software system for statistical analysissoftware system for statistical analysisA project to develop an open-source

software system for statistical analysissoftware system for statistical analysis

see presentation at LCG-AA meeting, 27 November 2002

Page 8: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Vision: the basics

Rigorous software processsoftware process

Have a visionvision for the project– General purpose tool for statistical analysis

– Toolkit approach (choice open to users)

– Open source product

Build on a solid architecturearchitecture

Clearly define scopescope, objectivesobjectives

Flexible, extensible, Flexible, extensible, maintainablemaintainable system

Software quality quality

Page 9: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Architectural guidelinesArchitectural guidelines

The project adopts a solid architectural architectural approach– to offer the functionalityfunctionality and the qualityquality needed by the users– to be maintainablemaintainable over a large time scale– to be extensibleextensible, to accommodate future evolutions of the requirements

Component-based architectureComponent-based architecture– to facilitate re-use and integration in diverse frameworks

DependenciesDependencies– adopt a standard (AIDA) for the user layer– no dependence on any specific analysis tool

PythonPython– the “glue” for interactivity

The approach adopted is compatible with the recommendations of the LCG Architecture Blueprint ReportLCG Architecture Blueprint Report

Page 10: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Software processSoftware process

United Software Development Process, specifically tailored to the project

– practical guidance and tools from the RUP– both rigorous and lightweight– mapping onto ISO 15504– significant experience gained in the group from other projects

Incremental and iterative life-cycle model

Page 11: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

The Goodness-of-Fit component

The Goodness-of-Fit component

Page 12: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

User RequirementsUser Requirements

User requirementsUser requirements elicitedelicited, analysedanalysed and formally specifiedformally specified – Functional (capability) and not-functional (constraint) requirements– User Requirements Document available from the web site

• Requirements• Design• Implementation• Test & test results• Documentation

Requirement traceability

Page 13: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Page 14: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Page 15: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Simple user layerSimple user layerShields the user from the complexity of the underlying algorithms and design

Only deal with AIDA objectsAIDA objects and choice of comparison algorithmcomparison algorithm

Page 16: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

GoF algorithmsGoF algorithmsAlgorithms for binned distributionsAlgorithms for binned distributions

– Anderson-Darling test– Chi-squared test – Fisz-Cramer-von Mises test– Tiku test (Cramer-von Mises test in chi-squared approximation)

Algorithms for unbinned distributionsAlgorithms for unbinned distributions – Anderson-Darling test– Fisz-Cramer-von Mises test– Goodman test (Kolmogorov-Smirnov test in chi-squared approximation)– Kolmogorov-Smirnov test– Kuiper test– Tiku test (Cramer-von Mises test in chi-squared approximation)

Page 17: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Chi-squared testChi-squared test

Applies to binned distributions

It can be useful also in case of unbinned distributions, but the data must be grouped into classes

Cannot be applied if the counting of the theoretical frequencies in each class is < 5

– When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached

Page 18: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

EMPIRICAL DISTRIBUTION FUNCTIONORIGINAL DISTRIBUTIONS

• Kolmogorov-Smirnov test

• Goodman approximation of KS test

• Kuiper test

)(

4 22

nm

nmDmn

)()( xGxFSupD mnmn

)()()()( 00* xFxFMaxxFxFMaxD TT

Dmn

Unbinned distributionsUnbinned distributionsSUPREMUM STATISTICSSUPREMUM STATISTICS

More sophisticated algorithmsMore sophisticated algorithms

Page 19: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

)()()(2

02 xdFxFxF T • Cramer-von Mises test

• Anderson-Darling test

)()(1)(

)()( 202 xdF

xFxF

xFxFA T

TT

T

• Fisz-Cramer-von Mises test

• k-sample Anderson-Darling test

i

ii xFxFnn

nnt 2

21221

21 )]()([)(

i k kkk

kiikk

iK nh

HnH

HnnFh

nkn

nA

4)(

)(1

)1(

)1( 2

22

Unbinned distributionsUnbinned distributions

Binned distributionsBinned distributions

TESTS CONTAINING A WEIGHTING FUNCTIONTESTS CONTAINING A WEIGHTING FUNCTION

More powerful algorithmsMore powerful algorithms

Page 20: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Anderson-Darling High Sensitive to tails

2 Low General

Fisz-Cramer-von Mises High Symmetric, right-skewed distributions

Goodman Medium Approximation of K-S to 2 test statistics

Kolmogorov-Smirnov Medium Derives from Kolmogorov statistics

Kuiper Medium Sensitive to tails and median

Tiku High Converts CvM statistics to a chi2

Test Power

Characteristics

More about a comparative evaluation of tests in the User Documentation on our web

Topic still subject to research activity in the domain of statistics

Comparative documentation of testsComparative documentation of tests

Page 21: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

2 loses information in a test for unbinned distribution by grouping the data into cells Kac, Kiefer and Wolfowitz (1955) showed that Kolmogorov-

Smirnov test requires n4/5 observations compared to n observations for 2 to attain the same power

Cramer-von Mises and Anderson-Darling statistics are expected to be superior to Kolmogorov-Smirnov’s, since they make a comparison of the two distributions all along the range of x, rather than looking for a marked difference at one point

2222 Supremum Supremum statistics statistics

teststests

Supremum Supremum statistics statistics

teststests

Tests Tests containing a containing a

weight functionweight function

Tests Tests containing a containing a

weight functionweight function< <

The power of a test is the probability of rejecting the null hypothesis correctly

In terms of power:

Power of testsPower of tests

Page 22: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Page 23: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Unit test: 2 (1)Unit test: 2 (1)

0

5

10

15

20

25

30

35

40

45

1 2 3 4 5 6 7 8 9 10 11 12

Fre

qu

en

cy

Birth distribution

Death distribution

EXAMPLE FROM PICCOLO BOOK (STATISTICS - page 711)

2 test-statistics = 15.8

Expected 2 = 15.8

Exact p-value=0.200758Expected p-value=0.200757

Months

The study concerns monthly birth and death distributions (binned data)

Page 24: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Unit test: 2 (2)Unit test: 2 (2)EXAMPLE FROM CRAMER BOOK

(MATHEMATICAL METHODS OF STATISTICS - page 447)The study concerns the sex distribution of children born in Sweden in 1935

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1 2 3 4 5 6 7 8 9 10 11 12

Classes

Fre

qu

en

cy

Boys

Girls

2 test-statistics = 123.203Expected 2 = 123.203

Exact p-value=0

Expected p-value=0

Page 25: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Unit test: K-S Goodman (1)Unit test: K-S Goodman (1)EXAMPLE FROM PICCOLO BOOK (STATISTICS - page 711)

2 test-statistics = 3.9

Expected 2 = 3.9Exact p-value=0.140974Expected p-value=0.140991

Months

The study concerns monthly birth and death distributions (unbinned data)

0

0,2

0,4

0,6

0,8

1

1 ,2

Cu

mu

lati

ve F

un

ctio

n

Page 26: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Unit test: K-S Goodman (2) Unit test: K-S Goodman (2)

2 test-statistics = 1.5Expected 2 = 1.5

EXAMPLE FROM LANDENNA BOOK (NONPARAMETRIC TESTS BASED ON FREQUENCIES - page 287)

We consider body lengths of two independent groups of anopheles

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

73 78 83 88 93 98

Distribution 1

Distribution 2

Exact p-value=0.472367Expected p-value=0.472367

Body lengths

Page 27: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Unit test: Kolmogorov-Smirnov(1)Unit test: Kolmogorov-Smirnov(1)

0

0,2

0,4

0,6

0,8

1

1,2

0 5 10 15 20 25 30 35 40 45 50

Time (s)

Redwell

Whitney

EXAMPLE FROM http://www.physics.csbsju.edu/stats/KS-test.html

D test-statistics =0.2204Expected D =0.2204

Exact p-value=0.0354675Expected p-value=0.035

The study concerns how long a bee stays near a particular tree (Redwell/Whitney)

Cu

mu

lati

ve

Page 28: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Unit test: Kolmogorov-Smirnov (2) Unit test: Kolmogorov-Smirnov (2)

EXAMPLE FROM LANDENNA BOOK (NONPARAMETRIC STATISTICAL METHODS - page 318-325)

We consider one clinical parameter of two independent groups of patients

D test-statistics = 0.65Expected D = 0.65

Exact p-value=2 10-19

Expected p-value=8 10-19

Distribution 1

Distribution 2

Cu

mu

lati

ve

Page 29: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Example of application resultsExample of application results

Anderson-Darling

Ac (95%) =0.752

Fluorescence spectrum from Icelandic basalt (Mars-like rock): experimental data and simulation

ESA Bepi Colombo mission to Mercury test beam at Bessy

Photon attenuation coefficient, Al

Geant4 Standard

Geant4 LowE

NIST

2N-L=13.1 – =20 p=0.87

2N-S=23.2 – =15 p=0.08

Electromagnetic models in Geant4 w.r.t. NIST reference

Page 30: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

Latest release: 30 March 2004

GPL License

Page 31: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

User DocumentationUser Documentation

Download

Installation

User Guide

Statistics Reference Guide

Page 32: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

A toolkit for modeling multi-parametric fit problems

A toolkit for modeling multi-parametric fit problems

F. Fabozzi, L. Lista

INFN Napoli

Initially developed while rewriting a fortran fitter for BaBar analysis

– Simultaneous estimate of:

B(B J/) / B(B J/K)

direct CP asymmetry

– More control on the code was needed to justify a bias appeared in the original fitter

Page 33: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

RequirementsRequirements

Provide Tools for modeling parametric fit problems

Unbinned Maximum Likelihood (UML[*]) fit of:– PDF parameters– Yields of different sub-samples– Both, mixed

2 fits

Toy Monte Carlo to study the fit properties– Fitted parameter distributions

Pulls, Bias, Confidence level of fit results

[*] not Unified Modeling Language … …

New components included in the Statistical Toolkit

Architecture open to extension and evolution

Page 34: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

For LCG usersFor LCG users

The Statistical Toolkit is distributed with PI as an external product– Currently the previous release - not the latest yet - is distributed– Update foreseen

Integration in the Savannah system for problem reporting foreseen

Open to collaboration to facilitate the usage in the LGC community

– feedback, user requirements, suggestions are welcome, of course!

Please contact [email protected] for further information about the Statistical Toolkit in PI distribution

Page 35: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

ReferencesReferences

Conference Proceedings:– PhyStat Conference, SLAC, 2003– IEEE Nuclear Science Symposium, Portland, 2003

Papers:– S. Donadio et al., A toolkit for statistical data comparison To be published in IEEE Trans. Nucl. Sci. (August 2004)

More papers in preparation

References kept up-to-date on the web site

Page 36: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

http://www.ge.infn.it/geant4/analysis/HEPstatistics/

Will be moved to a new area out of Geant4-INFN web (automatic re-direction)

Page 37: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

AcknowledgmentsAcknowledgments

Work supported and partially funded by the European Space Agency (ESA) under Contract No.16339/02/NL/FM

Geant4 beta testing– P. Cirrone (INFN-LNS), S. Guatelli (INFN Genova) , S. Parlati (INFN-LNGS)

Fred James (CERN) and Louis Lyons (Oxford)– many useful suggestions, discussions, encouragement...

Page 38: Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova

ConclusionsConclusionsA project to develop an open source, general purpose software toolkit for statistical data analysis is in progress

– to provide a product of common interest to user communities

Rigorous software process– to contribute to the quality of the product

Component-based architecture, OO methods + generic programming– to ensure openness to evolution, maintainability, ease of use

GoF component

Component for modeling multi-parametric fit problems

Software released and results available– toolkit in use for Geant4 physics validation– incremental and iterative life-cycle