detect unknown systematic effect: diagnose bad fit to multiple data sets advanced statistical...
Post on 21-Dec-2015
216 Views
Preview:
TRANSCRIPT
Detect Unknown Systematic Effect: Diagnose bad fit to
multiple data sets
Detect Unknown Systematic Effect: Diagnose bad fit to
multiple data sets
Advanced Statistical Techniques in Particle Physics
Grey College, Durham
18 - 22 March 2002
M. J. Wang
Institute of Physics
Academia Sinica
Advanced Statistical Techniques in Particle Physics
Grey College, Durham
18 - 22 March 2002
M. J. Wang
Institute of Physics
Academia Sinica
PrefacePreface
• Motivation and gratitude – Learn quite a lot at the workshop on
confidence limits at Fermilab in 2000 – Thanks for hosting this conference• Main title: Detect Unknown Systematic
Effect – More suitable to this conference aim – Important for experimentalists – Might be able to detect it in global fit• Sub-title: Diagnose bad fit to multiple
data sets – Global fit is not internally consistent – Don’t know which part is wrong?
– Need to diagnose the data sample
OutlineOutline
• Introduction
• Global fit and its goodness of fit
• Parameter fitting criterion
• Diagnose bad fit to multiple data sets
• Conclusion
IntroductionIntroduction
• Knowledge of parton distribution function is essential for hadron collider research
• Global fit is used to obtain parton distribution function
• Uncertainties of parton distribution function parameters
– Precision hadron collider results require estimates of uncertainties of parton distribution function parameters
– Important for Fermilab RunII and LHC physics analyses
IntroductionIntroduction
• Knowledge of parton distribution function is essential for hadron collider research
– Interpretation of data with SM
– SM parameter precision measurement
– Search for beyond SM signal
• Global fit is used to obtain parton distribution function
– Non-perturbative parton distribution functions could not be determined by PQCD
– Therefore, they are determined by global fit
Global fit and goodness of fit
Global fit and goodness of fit
• Reliable parton distribution function parameter and uncertainty estimates require passing goodness of fit criterion
– Total chi-square is used for goodness of fit
– +/- sqrt(2N) is used as a accepted range
• Is total chi-square good enough for goodness of fit ?
– Total chi-square is insensitive to small subset of data with bad fit
• Is there any way for more stringent criterion?
– Need new idea
Parameter fitting criterionParameter fitting criterion
• Idea motivated by Louis Lyons’s
goodness of fit paradox at ACAT 2000
• J.C. Collins and J. Pumplin applied this idea to the goodness of fit for global fit
– Hypothesis-testing vs parameter-fitting criteria
– Subset chi-square against total chi-square
– Found inconsistent data sets in CTEQ5 data sets
• Still don‘t know which part is correct or wrong ?
Parameter fitting criterionParameter fitting criterion
– Hypothesis-testing vs parameter-fitting criteria ( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.3 )
Parameter fitting criterionParameter fitting criterion
– Subset chi-square against total chi-square( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.10 )
Parameter fitting criterionParameter fitting criterion
– Found inconsistent data sets in CTEQ5 data ( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p. 13 )
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Importance of studying bad fit – Is the inconsistent data set free of
unknown systematic effects? – Is the theoretical prediction adequate? – Is there any hint for new physics?
• Any statistics for the diagnose purpose? – Pull can be used to identify
inconsistent experiment or data point ( thanks to F. James’s “Statistical methods in experimental physics” )
– But for real data, there is no measured pull distribution for each data point
– What should we do with pull ?
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Pull definition for each data point
Mi = Ti + ( random error )
Ri = Ti - Mi = -( random error )
Pi = Ri / sigma( Ri )
• Pull properties
– Gaussian shape
– Center at zero
– With unit variance
– Independence among pulls of different data points
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Systematic effects introduce correlation among pulls
– Constant shift on all data points
– Correlated shift on all data points
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Correlation among pulls is the key for detecting unknown systematic effects
• Pull correlation study
– Pull distribution consists of all data points in one experiment( experiment pull distribution )
– Pull as a function of measurement variable X
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
Mi = Ti + ( random error ) + Si ( or S )
Ri = Ti - Mi = -( random error ) - Si ( or S )
Pi = Ri / sigma( Ri )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( MC data vs true curve )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( residual dis. of first 6 channels with 10,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( 10% uncertainty on error estimate of the first 6 channels with 10,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( pull dis. of the first 6 channels with 10,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( effect of error estimate uncertainties 0%,10%,20% on pull dis. With 10,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift ( experiment residual and pull dis. with 100,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift ( experiment residual and pull profiles as function of X with 100,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( experiment residual and pull dis. with 100 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( experiment residual and pull profile as function of X with 100 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( MC data vs true curve )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( residual dis. Of the first 6 channels with 10,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( experiment residual and pull dis. with 100,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( experiment residual and pull profile as function of X with 100,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( experiment residual and pull dis. as function of X with 100 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( experiment residual and pull profiles as function of X with 100 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( MC data vs true curve )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( residual dis. Of the first 6 channels with 10,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. with 100,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100,000 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. as function of X with 100 entries )
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100 entries )
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Real case with known systematic uncertainties
Mi = Ti + ( random error ) +
( systematic error ) + Si ( or S )
Ri = Ti – Mi = - ( random error ) –
( systematic error ) - Si( or S )
Pi = Ri / sigma( Ri )
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Real case with known systematic uncertainties
– Need to take out known systematic uncertainty term in order to restore the independence property
– Need to fit the residual systematic effect with the aid of global fit
– Regain the naive case results
ConclusionConclusion
• Global fit is important in determining parton distribution function parameter and uncertainties
• There are inconsistent data samples found by the parameter fitting criterion
• Correlations among pulls could be a technique of detecting unknown systematic effects
• Will apply and implement this technique to global fit
top related