detect unknown systematic effect: diagnose bad fit to multiple data sets advanced statistical...

Detect Unknown Systematic Effect: Diagnose bad fit to

multiple data sets

Detect Unknown Systematic Effect: Diagnose bad fit to

multiple data sets

Advanced Statistical Techniques in Particle Physics

Grey College, Durham

18 - 22 March 2002

M. J. Wang

Institute of Physics

Academia Sinica

Advanced Statistical Techniques in Particle Physics

Grey College, Durham

18 - 22 March 2002

M. J. Wang

Institute of Physics

Academia Sinica

PrefacePreface

• Motivation and gratitude – Learn quite a lot at the workshop on

confidence limits at Fermilab in 2000 – Thanks for hosting this conference• Main title: Detect Unknown Systematic

Effect – More suitable to this conference aim – Important for experimentalists – Might be able to detect it in global fit• Sub-title: Diagnose bad fit to multiple

data sets – Global fit is not internally consistent – Don’t know which part is wrong?

– Need to diagnose the data sample

OutlineOutline

• Introduction

• Global fit and its goodness of fit

• Parameter fitting criterion

• Diagnose bad fit to multiple data sets

• Conclusion

IntroductionIntroduction

• Knowledge of parton distribution function is essential for hadron collider research

• Global fit is used to obtain parton distribution function

• Uncertainties of parton distribution function parameters

– Precision hadron collider results require estimates of uncertainties of parton distribution function parameters

– Important for Fermilab RunII and LHC physics analyses

IntroductionIntroduction

• Knowledge of parton distribution function is essential for hadron collider research

– Interpretation of data with SM

– SM parameter precision measurement

– Search for beyond SM signal

• Global fit is used to obtain parton distribution function

– Non-perturbative parton distribution functions could not be determined by PQCD

– Therefore, they are determined by global fit

Global fit and goodness of fit

• Reliable parton distribution function parameter and uncertainty estimates require passing goodness of fit criterion

– Total chi-square is used for goodness of fit

– +/- sqrt(2N) is used as a accepted range

• Is total chi-square good enough for goodness of fit ?

– Total chi-square is insensitive to small subset of data with bad fit

• Is there any way for more stringent criterion?

– Need new idea

Parameter fitting criterionParameter fitting criterion

• Idea motivated by Louis Lyons’s

goodness of fit paradox at ACAT 2000

• J.C. Collins and J. Pumplin applied this idea to the goodness of fit for global fit

– Hypothesis-testing vs parameter-fitting criteria

– Subset chi-square against total chi-square

– Found inconsistent data sets in CTEQ5 data sets

• Still don‘t know which part is correct or wrong ?

– Hypothesis-testing vs parameter-fitting criteria ( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.3 )

– Subset chi-square against total chi-square( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.10 )

– Found inconsistent data sets in CTEQ5 data ( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p. 13 )

Diagnose bad fit to multiple data sets

• Importance of studying bad fit – Is the inconsistent data set free of

unknown systematic effects? – Is the theoretical prediction adequate? – Is there any hint for new physics?

• Any statistics for the diagnose purpose? – Pull can be used to identify

inconsistent experiment or data point ( thanks to F. James’s “Statistical methods in experimental physics” )

– But for real data, there is no measured pull distribution for each data point

– What should we do with pull ?

• Pull definition for each data point

Mi = Ti + ( random error )

Ri = Ti - Mi = -( random error )

Pi = Ri / sigma( Ri )

• Pull properties

– Gaussian shape

– Center at zero

– With unit variance

– Independence among pulls of different data points

• Systematic effects introduce correlation among pulls

– Constant shift on all data points

– Correlated shift on all data points

• Correlation among pulls is the key for detecting unknown systematic effects

• Pull correlation study

– Pull distribution consists of all data points in one experiment( experiment pull distribution )

– Pull as a function of measurement variable X

• Naive case without known systematic uncertainties

Mi = Ti + ( random error ) + Si ( or S )

Ri = Ti - Mi = -( random error ) - Si ( or S )

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

– Representative systematic shifts

1. Constant horizontal shift( MC data vs true curve )

1. Constant horizontal shift( residual dis. of first 6 channels with 10,000 entries )

1. Constant horizontal shift( 10% uncertainty on error estimate of the first 6 channels with 10,000 entries )

1. Constant horizontal shift( pull dis. of the first 6 channels with 10,000 entries )

1. Constant horizontal shift( effect of error estimate uncertainties 0%,10%,20% on pull dis. With 10,000 entries )

1. Constant horizontal shift ( experiment residual and pull dis. with 100,000 entries )

1. Constant horizontal shift ( experiment residual and pull profiles as function of X with 100,000 entries )

1. Constant horizontal shift( experiment residual and pull dis. with 100 entries )

1. Constant horizontal shift( experiment residual and pull profile as function of X with 100 entries )

2. Constant vertical shift( MC data vs true curve )

2. Constant vertical shift( residual dis. Of the first 6 channels with 10,000 entries )

2. Constant vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )

2. Constant vertical shift( experiment residual and pull dis. with 100,000 entries )

2. Constant vertical shift( experiment residual and pull profile as function of X with 100,000 entries )

2. Constant vertical shift( experiment residual and pull dis. as function of X with 100 entries )

2. Constant vertical shift( experiment residual and pull profiles as function of X with 100 entries )

3. Combined horizontal and vertical vertical shift ( MC data vs true curve )

3. Combined horizontal and vertical vertical shift ( residual dis. Of the first 6 channels with 10,000 entries )

3. Combined horizontal and vertical vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )

3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. with 100,000 entries )

3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100,000 entries )

3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. as function of X with 100 entries )

3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100 entries )

• Real case with known systematic uncertainties

Mi = Ti + ( random error ) +

( systematic error ) + Si ( or S )

Ri = Ti – Mi = - ( random error ) –

( systematic error ) - Si( or S )

• Real case with known systematic uncertainties

– Need to take out known systematic uncertainty term in order to restore the independence property

– Need to fit the residual systematic effect with the aid of global fit

– Regain the naive case results

ConclusionConclusion

• Global fit is important in determining parton distribution function parameter and uncertainties

• There are inconsistent data samples found by the parameter fitting criterion

• Correlations among pulls could be a technique of detecting unknown systematic effects

• Will apply and implement this technique to global fit

detect unknown systematic effect: diagnose bad fit to multiple data sets advanced statistical...

bad fit

global fit slide

sm signal global fit

global fit subtitle

data sample slide

inconsistent data sets

cteq5 data sets

multiple data sets importance

Documents

business white paper when application performance is ... ·...

lab sciences...certificate in medical laboratory sciences...

detect, diagnose and solve problems with application...

medical diagnose

300ma ldo regulator with over/under voltage detection...

epidemiologie & diagnose

eeaton diagnose system aton diagnose system

annual report 2014 - staging.batm.com · diagnostic systems...

diagnose foliar na cultura do tomateiro hideaki wilson...

where global solutions are shaped for you - state parties...

reducing costs/improving quality in - keysight...

vision care -...

modellbasierte diagnose

nuclear cardiology and heart...

diagnose, detect, optimise: from an overall site view, down...

a novel way to diagnose acute pancreatitis on the …...the...

materiais diagnose

radiation protection of workers nuclear medicine · nuclear...

3l diagnostic diagnose ministries diagnose spiritual need...

let me introduce you to ai, my new co-workerblack-box? 1)...