barbara mascialinomonte carlo 2005chattanooga, april 19 th 2005 monte carlo 2005 - chattanooga,...
DESCRIPTION
Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Qualitative evaluation Quantitative evaluation GoF statistical toolkit A project to develop a statistical comparison system A project to develop a statistical comparison system Comparison of distributions Goodness of fit testingTRANSCRIPT
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
Monte Carlo 2005 - Chattanooga, April 2005
B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon, P. ViarengoB. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon, P. Viarengo
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
Provide tools for the Provide tools for the statistical comparisonstatistical comparison of distributions of distributions equivalent reference distributions experimental measurements data from reference sources functions deriving from theoretical calculations or fits
Detector monitoringDetector monitoring Simulation validationSimulation validation
Reconstruction vs. expectationReconstruction vs. expectationRegression testingRegression testingPhysics analysisPhysics analysis
Data analysis
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
Qualitative evaluationQualitative evaluationQuantitative evaluationQuantitative evaluation
GoF statistical toolkit
A project to develop a statistical comparison systemstatistical comparison system
Comparison of distributionsComparison of distributions
Goodness of fit testingGoodness of fit testing
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
• United Software Development ProcessUnited Software Development Process, specifically tailoredtailored to the project– practical guidance and tools from the RUPRUP– both rigorous and lightweight– mapping onto ISO 15504
• Guidance from ISO 15504ISO 15504
• Incremental and iterative life cycle model
Software process guidelines
SPIRAL APPROACHSPIRAL APPROACH
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
• The project adopts a solid architectural approachsolid architectural approach– to offer the functionalityfunctionality and the qualityquality needed by the users– to be maintainablemaintainable over a large time scale– to be extensibleextensible, to accommodate future evolutions of the
requirements
• Component-based approachComponent-based approach– to facilitate re-use and integration in different frameworks
• AIDAAIDA– adopt a (HEP) standard– no dependence on any specific analysis tool
Architectural guidelines
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
The tests are specialised on the kind of distribution The tests are specialised on the kind of distribution (binned/unbinned)(binned/unbinned)
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo
“A Goodness-of-Fit Statistical Toolkit”IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063.
Release StatisticsTesting-V1-01-00 downloadable from the web:http://www.ge.infn.it/geant4/analysis/HEPstatistics/
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
• Applies to binnedbinned distributions
• It can be useful also in case of unbinned distributions, but the data must be grouped into classes
• Cannot be applied if the counting of the theoretical frequencies in each class is < 5
– When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached
– Otherwise one could use Yates’ correction
Chi-squared testChi-squared test
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
EMPIRICAL DISTRIBUTION FUNCTIONORIGINAL DISTRIBUTIONS
• Kolmogorov-Smirnov test
• Goodman approximation of KS test
• Kuiper test
)(
4 22
nmnmDmn
)()( xGxFSupD mnmn
)()()()( 00* xFxFMaxxFxFMaxD TT
Dmn
Tests based on maximum distanceTests based on maximum distanceunbinned distributionsunbinned distributions
SUPREMUMSUPREMUMSTATISTICSSTATISTICS
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
• Fisz-Cramer-von Mises test
• Approx Anderson-Darling test
i
ii xFxFnnnnt 2
21221
21 )]()([)(
i k kkk
kiikk
ik nhHnH
HnnFhnkn
nA
4)(
)(1)1(
)1( 2
22
Tests containing a weighting functionTests containing a weighting functionbinned/unbinned distributionsbinned/unbinned distributions
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
EMPIRICAL DISTRIBUTION FUNCTIONORIGINAL DISTRIBUTIONS
QUADRATICQUADRATICSTATISTICSSTATISTICS
+ + WEIGHTING WEIGHTING FUNCTIONFUNCTION
Sum/integral of all the distances
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
The user is completely shieldedshielded from both statistical and computing complexity.
USERUSER
EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODEEXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE
TOOLKITTOOLKIT STATISTICALSTATISTICALRESULTRESULT
User’s point of viewUser’s point of view• Simple user layerSimple user layer• Only deal with AIDA objectsAIDA objects and choice of comparison algorithmcomparison algorithm
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
Software testingSoftware testing
Rigorous software process adopted
Test process
Unit tests
Integration tests
System tests
Testing focuses primarily on the evaluation or assessment of qualityquality of the software product, guaranteeing its correctnesscorrectness and robustnessrobustness.
• finding and documenting defects in software quality• validating software product functions as designed• validating that the requirements have been implemented appropriately
Test result summaries are included as part of the documentation of the Toolkit release and are available on the web.
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
• Weighted KS tests
• Weighted CVM tests• CVM approximation to a 2 (Tiku test)• Exact Anderson-Darling test
• Watson test• Watson approximation to a 2 (Tiku test)
With these tests the GoF Statistical Toolkit will be the most complete toolkit for the two-sample problem in physics as well as in the statisticsdomain.
Work in progress: new testsWork in progress: new tests
unbinned distributionsunbinned distributions
binned/unbinned binned/unbinned distributionsdistributions
unbinned distributionsunbinned distributions
supremum statisticssupremum statistics
quadratic statisticsquadratic statistics
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
2 loses information in a test for unbinned distribution by grouping the data into cellsKac, Kiefer and Wolfowitz (1955) showed that Kolmogorov-Smirnov test
requires n4/5 observations compared to n observations for 2 to attain the same power
Cramer-von Mises and Anderson-Darling statistics are expected to be superior to Kolmogorov-Smirnov’s, since they make a comparison of the two distributions all along the range of x, rather than looking for a marked difference at one point
22 Supremum Supremum statistics statistics
teststests
Tests Tests containing a containing a
weight functionweight function< <
In terms of power:
IsIs 2 the most powerful algorithm?The power of a test is the
probability of rejecting the null hypothesis correctly
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
Examples of practical applicationsExamples of practical applications
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
MiMicroscopic validation of physics
p-va
lue
H0 REJECTION AREA
The three Geant4 models are equivalent
Physics models under test:• Geant4 Standard• Geant4 Low Energy – Livermore• Geant4 Low Energy – Penelope
Reference data:• NIST ESTAR - ICRU 37
Z
p-value stability study
Geant4 LowE PenelopeGeant4 LowE Penelope Geant4 StandardGeant4 StandardGeant4 LowE EEDLGeant4 LowE EEDLNIST - XCOMNIST - XCOM
Geant4 LowE PenelopeGeant4 LowE Penelope Geant4 StandardGeant4 Standard
Geant4 LowE EEDLGeant4 LowE EEDL
K. Amako, S. Guatelli, V. Ivanchenko, M. Maire, B. Mascialino, K. Murakami, P. Nieminen, L. Pandola, S. Parlati, A. Pfeiffer, M. G. Pia, M.
Piergentili, T. Sasaki, L. UrbanPrecision validation of Geant4 electromagnetic
physics
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
Radioprotection Radioprotection applications in manned space missions
-Comparison of inflatable and conventional rigid habitat concepts: Effect of different shielding materialsdifferent shielding materials Effect of shielding thickness shielding thickness E.m.E.m. and hadronic interactionshadronic interactions
S. Guatelli, B. Mascialino, P. Nieminen, M. G. PiaRadioprotection for interplanetary manned missions
inflatable habitat
GCR
vacuum air
phantomAl structure/ inflatable structure + shielding
2.15 cm Al
Inflatable habitat + 10 cm water
Inflatable habitat + 5 cm water
4 cm Al
Energy deposit in the phantom by GCR p
S. Guatelli, B. Mascialino, P. Nieminen, M. G. PiaRadioprotection for interplanetary manned missions
thanks to Susanna GuatelliKS TEST
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
2 not appropriate(< 5 entries in some
bins, physical information would be
lost if rebinned)
Anderson-Darling
Ac (95%) =0.752A. Mantero, B. Mascialino, P. Nieminen, M. G. Pia, A. Owens,
M. Bavdaz, A. PeacockA library for simulated X-ray emission from planetary surfaces
Test beam at BessyTest beam at BessyBepi-Colombo missionBepi-Colombo mission
Energy (keV)
Cou
nts
X-ray fluorescence spectrum in Iceand basalt(EIN=6.5 keV)
Very complex distributions
thanks to Alfonso Mantero
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
Medical applications-IMRTMedical applications-IMRT
Range D p-value
-84 -60 mm 0.385 0.23-59 -48 mm 0.27 0.90-47 47 mm 0.43 0.1948 59 mm 0.30 0.8260 84 mm 0.40 0.10
Range D p-value
-56 -35 mm 0.26 0.89-34 -22 mm 0.43 0.42-21 21 mm 0.38 0.0822 32 mm 0.26 0.9833 36 mm 0.57 0.13
Kolmogorov-Smirnov testDistance (mm) Distance (mm)
% d
ose
% d
ose
F. Foppiano, B. Mascialino, M. G. Pia, M. PiergentiliGeant4 simulation of an accelerator head for intensity
modulated radiotherapy
thanks to Michela Piergentili
Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005
• This is a newnew up-to-dateup-to-date easy to handleeasy to handle and powerfulpowerful tool for statistical comparison in particle physics.
• It the first tool supplying such a variety of sophisticated and sophisticated and powerful statistical testspowerful statistical tests in HEP.
• ReleasedReleased and downloadable from the webdownloadable from the web.
• AIDAAIDA interfaces allow its integration in any other data analysis tool.
Applications in: Applications in: HEPHEP, , astrophysicsastrophysics, , medical physicsmedical physics, … , …
ConclusionsConclusions