roounfold unfolding framework and algorithms

RooUnfoldRooUnfoldunfolding frameworkunfolding framework

and algorithmsand algorithms

Tim AdyeRutherford Appleton Laboratory

BaBar Statistics Working GroupBaBar Collaboration Meeting

13th December 2005

13th December 2005

Tim Adye 2

Outline

• What is Unfolding?• and why might you want to do it?

• Overview of a few techniques• Regularised unfolding• Iterative method

• RooUnfold package• Currently implements three methods with a common

interface

• Status and Plans• References

13th December 2005

Tim Adye 3

Unfolding

• In other fields known as “deconvolution”, “unsmearing”

• Given a “true” PDF in μ, that is corrupted by detector effects, described by a response function, R, we measure a distribution in ν. In terms of histograms

• This may involve1. inefficiencies: lost events2. bias and smearing: events moving between bins

(off-diagonal Rij)

• With infinite statistics, it would be possible to recover the original PDF by inverting the response matrix

M

jjiji R

1

νRμ 1

Ni ..1

13th December 2005

Tim Adye 4

Not so simple…

• Unfortunately, if there are statistical fluctuations between bins this information is destroyed• Since R washes out statistical fluctuations, R-1 cannot

distinguish between wildly fluctuating and smooth PDFs• Obtain large negative correlations between adjacent bins• Large fluctuations in reconstructed bin contents

• Need some procedure to remove wildly fluctuating solutions1. Give added weight to “smoother” solutions

2. Solve for µ iteratively, starting with a reasonable guess and truncate iteration before it gets out of hand

3. Ignore bin-to-bin fluctuations altogether

13th December 2005

Tim Adye 5

What happens if you don’t smooth

13th December 2005

Tim Adye 6

True Gaussian, with Gaussian smearing, systematic translation, and variable inefficiency – trained using a different Gaussian

13th December 2005

Tim Adye 7

Double Breit-Wigner, with Gaussian smearing, systematic translation, and variable inefficiency – trained using a single

Gaussian

13th December 2005

Tim Adye 8

So why don’t we always do this?

• If the true PDF and resolution function can be parameterised, then a Maximum Likelihood fit is usually more convenient• Directly returns parameters of interest• Does not require binning

• If the response function doesn’t include smearing (ie. it’s diagonal), then apply bin-by-bin efficiency correction directly

• If result is just needed for comparison (eg. with MC), could apply response function to MC• simpler than un-applying response to data

13th December 2005

Tim Adye 9

When to use unfolding

• Use unfolding to recover theoretical distribution where• there is no a-priori parameterisation• this is needed for the result and not just comparison with

MC• there is significant bin-to-bin migration of events

13th December 2005

Tim Adye 10

Where could we use unfolding?

• Traditionally used to extract structure functions• Widely used outside PP for image reconstruction

• Dalitz plots• Cross-feed between bins due to misreconstruction

• “True” decay momentum distributions• Theory at parton level, we measure hadrons• Correct for hadronisation as well as detector effects

13th December 2005

Tim Adye 11

1. Regularised Unfolding

• Use Maximum Likelihood to fit smeared bin contents to measured data, but include regularisation function

where the regularisation parameter, α, controls the degree of smoothness (select α to, eg., minimise mean squared error)

• Various choices of regularisation function, S, are used• Tikhonov regularisation: minimise curvature

• for some definition of curvature, eg.

• RooUnfHistoSvd by Kerstin Tackmann and Heiko Lacker• based on GURU by Andreas Höcker and Vakhtang Kartvelishvili• uses Singular Value Decomposition

• RUN by Volker Blobel

• Maximum entropy:

)()(ln)(ln μμμ SLL

21

211 ])()[()(

M

iiiiiS μ

)/ln()/()( tottot i

M

iiS μ

13th December 2005

Tim Adye 12

2. Iterative method

• Uses Bayes’ theorem to invert

and using an initial set of probabilities, pi (eg. flat) obtain an improved estimate

• Repeating with new pi from these new bin contents converges quite rapidly• Truncating the iteration prevents us seeing the bad effects of

statistical fluctuations

• Fergus Wilson and I have implemented this method in ROOT/C++• Supports 1D, 2D, and 3D cases

) bin in valuetrue| bin in observed( jiPRij

j

N

j k kjk

iij

ii n

pR

pR

1

1ˆ

13th December 2005

Tim Adye 13

2D Unfolding Example

2D Smearing, bias, variable efficiency, and

variable rotation

13th December 2005

Tim Adye 14

RooUnfold Package

• Make these different methods available as ROOT/C++ classes with a common interface to specify• unfolding method and parameters• response matrix

• pass directly or fill from MC sample

• measured histogram• return reconstructed truth histogram and errors

• full covariance matrix

• Easy to do with multiple dimensions (when supported)

• This should make it easy to try and compare different methods in your analysis• Could also be useful outside BaBar!

13th December 2005

Tim Adye 15

RooUnfold Classes• RooUnfoldResponse

• response matrix with various filling and access methods• create from MC, use on data (can be stored in a file)

• RooUnfold – unfolding algorithm base class• RooUnfoldBayes – Iterative method• RooUnfoldSvd – Inteface to RooUnfHistoSvd package• RooUnfoldBinByBin – Simple bin-by-bin method

• Trivial implementation, but useful to compare with full unfolding

• RooUnfoldExample – Simple 1D example

• RooUnfoldTest and RooUnfoldTest2D• Test with different training and unfolding distributions

13th December 2005

Tim Adye 16

RooUnfold Status

• Available in CVS• Announced in Statistics HN• See README file for details of building and running

• Interface can still be adjusted based on comments• I already have an idea for simplifying use in multi-

dimensional case

13th December 2005

Tim Adye 17

Plans and possible improvements• So far this is mostly a programming exercise

• Would be interesting to compare the different methods for some real analysis distributions

• But YMMV

• Add common tools, useful for all algorithms• Inputs and results in different formats

• already supports histograms and ROOT vectors/matrices

• Automatic calculation of figures of merit (eg. Â2)• can also use standard ROOT functions on histograms

• Simplify selection of regularisation parameter

• More algorithms?• Maximum entropy regularisation• Simple matrix inversion without regularisation

• perhaps useful with large statistics

13th December 2005

Tim Adye 18

References - Overview

• G. Cowan, A Survey of Unfolding Methods for Particle Physics, Proc. Advanced Statistical Techniques in Particle Physics, Durham (2002)http://www.ippp.dur.ac.uk/Workshops/02/statistics/

• G. Cowan, Statistical Data Analysis, Oxford University Press (1998), Chapter 11: Unfolding

• R. Barlow, SLUO Lectures on Numerical Methods in HEP (2000),Lecture 9: Unfoldingwww-group.slac.stanford.edu/sluo/Lectures/Stat_Lectures.html

13th December 2005

Tim Adye 19

References - Techniques

• V. Blobel, Unfolding Methods in High Energy Physics,DESY 84-118 (1984); also CERN 85-02

• A. Höcker and V. Kartvelishvili, SVD Approach to Data Unfolding, NIM A 372 (1996) 469www.lancs.ac.uk/depts/physics/staff/kartvelishvili.html

• K. Tackmann, H. Lacker, Unfolding the Hadronic Mass Spectrumin B->Xu lν Decays, BAD 894.

• G. D’Agostini, A multidimensional unfolding method based on Bayes’ theorem, NIM A 362 (1995) 487

roounfold unfolding framework and algorithms

Documents

significant bin

handignore bin

response function doesnt

gaussian smearing

reconstructed bin contentsneed

smeared bin contents

statistical fluctuations

resolution function