event analysis seminar - schwier/talks/seminars/marseille2007advancedmethods.pdf · event counting...

Post on 12-Mar-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Advanced event analysis methods at the energy frontier

Reinhard Schwienhorst

CPPM Seminar, July 5 2007

2Reinhard Schwienhorst, Michigan State University

Outline• Introduction

– Physics at the energy frontier

• Analysis procedure• Event analysis methods

– Decision trees• Boosting• Random forest

– Bayesian neural networks– Classifier comparisons

• ConclusionsDisclaimer: highlight general principles and guiding ideas

• Not necessarily mathematically rigorous• Some of the same topics were addressed at PhyStat conferences

3Reinhard Schwienhorst, Michigan State University

Typical event analysis procedures

1) Cut-based event counting2) Peak in a characteristic distribution

Event counting• Apply cuts to variables

describing the event– Object identification– Kinematic cuts on objects– Event kinematics

• Goal: cut until the signalis visible– No background left

– Or large S/√B

• Sensitive to any signal with this final state

• Requires understanding of background

example: Z discovery at UA12 EM clusters,ET > 25 GeV

1 EM clustertrack-matched

both EM clusterstrack-matched

5Reinhard Schwienhorst, Michigan State University

Peak in a characteristic distribution• Find a variable that has a smooth

distribution for background – Typically invariant mass

• Measure this distribution over a large range of possible values

• Look for possible resonance peaks • Sensitive to any resonance with

this final state• Background estimate for sidebands

“Bump Hunting”

Example: b-quark discovery at Fermilab

6Reinhard Schwienhorst, Michigan State University

The energy frontier• Colliding particles at the highest available energies

• Probe structure of matter at the most fundamental level

– Observe interactions at the smallest possible distances– Produce never-before-seen particles

Future: LHCFuture: LHCPresent: TevatronPresent: Tevatron

7Reinhard Schwienhorst, Michigan State University

Searches at the energy frontier• Tevatron:

– Single top quark production

Low DTregion

High DTregion

Searches at the energy frontier• Searches for new particles, phenomena, couplings

– Tevatron:• Single top quark production• Higgs boson search• SUSY• Extra dim• ...• Statistics-limited

Tevatron Higgs Sensitivity

Multivariate techniques make this level of sensitivity possible

Searches at the energy frontier• Searches for new particles, phenomena, couplings

– Tevatron:• Single top quark production• Higgs boson search• SUSY• Extra dim• ...

– LHC:• Higgs searches

SM background

LHC Higgs Sensitivity

Multivariate techniqueswill be required to reachthis level of sensitivity

Searches at the energy frontier• Searches for new particles, phenomena, couplings

– Tevatron:• Single top quark production• Higgs boson search• SUSY• Extra dim• ...

– LHC:• Higgs searches• SUSY• Extra dim• ...•

LHC SUSY signatureMeff = ET + ∑ pT (jets)

SM background

Susy at 1TeV

/

Measurements at the energy frontier– First measurements of properties, couplings

• With samples of limited size• Example: Top quark mass

– ~ 3 GeV with 1 fb-1 – Was the goal for 2 fb-1!

lepton+jets top mass, matrix element method

Measurements at the energy frontier• First measurements of properties, couplings

– With samples of limited size– Example:

LHC SUSY particlemasses

LHC b mass: 100 signal events in 30 fb-1

M(llbbET) [GeV]/

~

Physics at the energy frontier• Searches for new particles, phenomena, couplings• First measurements of properties, couplings

• Multivariate techniques Adding more data

Making the most out of small samples of events

14Reinhard Schwienhorst, Michigan State University

How to improve upon

Event countingand

Bump hunting

15Reinhard Schwienhorst, Michigan State University

Bayesian limit• For each analysis, there exists a fully optimized

signal-background separation– Target function, also called Bayes discriminant or Bayesian

limit

• For a single discriminating variable, this ratio of signal and background likelihoods is easy to calculate– Monte Carlo procedure:

• Generate signal and background MC events• Fill histograms for signal and background• Divide the two histograms

B(x) = _____L(S|x)

L(B|x)

16Reinhard Schwienhorst, Michigan State University

Bayesian limit• For each analysis, there exists a fully optimized

signal-background separation– Target function, also called Bayes discriminant or Bayesian

limit

• For a single discriminating variable, this ratio of signal and background likelihoods is easy to calculate

• In case of more than one variables, this isn't possible anymore– Not enough MC statistic to compute a

many-dimensional likelihood

B(x) = _____L(S|x)

L(B|x)

Curse of dimensionality

17Reinhard Schwienhorst, Michigan State University

Optimized event analysis

• Requires detailed understanding of signal and background– Only applicable to searches for a specific signal or

measurements of a specific process

Optimized =

Optimize signal-background separationExploit full event information

Event kinematics, angular correlations, ...Take all correlations into account

Goal: Reach the Bayesian limit

18Reinhard Schwienhorst, Michigan State University

Optimized event analysis

• Requires detailed understanding of signal and background– Only applicable to searches for a specific signal or

measurements of a specific process• Limited by background and signal modeling

– MC statistics, MC model, background composition, shape, ...

Optimized =

Optimize signal-background separationExploit full event information

Event kinematics, angular correlations, ...Take all correlations into account

If signal model is wrong: search is not sensitive

If background model is wrong: find something that isn't there

Goal: Reach the Bayesian limit

19Reinhard Schwienhorst, Michigan State University

Event analysis techniques

Many others: Kernel methods, support vector machines, ...

Bayesian neural networksBoosted decision trees,random forest

Matrix Elements

Cut-Based Neural networks Decision trees Likelihood

20Reinhard Schwienhorst, Michigan State University

Event analysis techniques

Cut­based

decision trees Matrix Elements

Likelihood

Bayesian neural networks

Neural networks

21Reinhard Schwienhorst, Michigan State University

Cut-based analysis

Lepton pT>41 GeV

P

P

Event Energy<65GeV

M(top) <352 GeV

Final Event Set

• Estimate background yield• Compare to data

Nobs = Ndata – NB

• Calculate signal acceptanceσ = Nobs / (A*L)

In the final event set

P

22Reinhard Schwienhorst, Michigan State University

Including events that fail a cut

pTl>41

PF

PF

ptl>65

PF

M<352

– Create a tree of cuts– Divide sample into

“pass” and “fail” sets – Each node corresponds

to a cut (branch)

23Reinhard Schwienhorst, Michigan State University

Trees and leafs

PF

PFPF

M<352

– Create a tree of cuts– Divide sample into

“pass” and “fail” sets – Each node corresponds

to a cut (branch)– A leaf corresponds to an

end-point– For each leaf, calculate purity

(from MC):purity = NS/(NS+NB)

Leaf

pTl>41

ptl>65

24Reinhard Schwienhorst, Michigan State University

Decision tree

PF

PFPF

M<352

– Create a tree of cuts– Divide sample into

“pass” and “fail” sets – Each node corresponds

to a cut (branch)– A leaf corresponds to an

end-point– For each leaf, calculate purity

(from MC):purity = NS/(NS+NB)

– Train the tree by optimizing the Gini improvement:

• Gini = 2 NS NB /(NS + NB)

• Each leaf will be either background- or signal-enhanced

Leaf

pTl>41

ptl>65

25Reinhard Schwienhorst, Michigan State University

Decision tree output• Train on signal and background models (MC)

– Stop and create leaf when NMC<100

• Compute purity value for each leaf• Send data events through tree

– Assign purity value corresponding to the leaf to the event

• Result approximates a probability density distribution

26Reinhard Schwienhorst, Michigan State University

Boosting

27Reinhard Schwienhorst, Michigan State University

Boosting• A general method to improve the

performance of any weak qualifier– Decision trees, neural networks, ...

• Linear combination of many filter functions

– ak: coefficient, typically result of minimization of error function

F(x) = ∑ ak fk(x) k

28Reinhard Schwienhorst, Michigan State University

Boosting procedure

Train filter function Train filter function ffkk

Find coefficient aFind coefficient akk minimize error functionminimize error function

Initial training sample TInitial training sample Tk=1k=1

Modify training sample TModify training sample Tkk

F = ∑ ak fk k

29Reinhard Schwienhorst, Michigan State University

Adaptive boosting

• In each iteration, update coefficient ak – From minimizing error function– coefficients decrease at each iteration

• Update weight for each event in training sample Tk – Figure out which events have been misclassified

• Signal events should have purity ≥ 0.5

• Background should have purity <0.5

– Increase event weight for those events that have been misclassified

30Reinhard Schwienhorst, Michigan State University

Boosting performance

DØ single top searchwith decision trees

3 different sets ofdiscriminating variables

31Reinhard Schwienhorst, Michigan State University

Random forest• Average over many

decision trees– Typically O(100)

• Each tree is grown usingm variables– For N total variables, m<<N

• Very fast algorithm– Even with large number of variables

• Very few parameters to adjust– Typically only m

32Reinhard Schwienhorst, Michigan State University

Event analysis techniques

Bayesian neural networksBoosted decision trees,random forest

Matrix Elements

Cut-Based Neural networks Decision trees Likelihood

33Reinhard Schwienhorst, Michigan State University

Neural networks

f(x) = Σ w'k nk(x,wk)

Output Node: linear combination of hidden nodes

0 1

B S

Input Nodes: One for each variable xi

nk(x,wk) =1

1 + e- Σ wik xi

Sigmoid Hidden Nodes: Each is a sigmoid dependent on the input variables

34Reinhard Schwienhorst, Michigan State University

Neural Network Training

– Find optimum NN parameters on training signal/background events

– Apply NN to independent set of signal and background• Testing sample

– Stop training when error from testing sample starts increasing• Overfitting

DØ single top search

35Reinhard Schwienhorst, Michigan State University

Bayesian neural networks• Bayesian idea:

– Rather than finding one value for each weight,determine the posterior probability for each weight

• Form many networks by sampling from the posterior• Typical case: ~100 individual neural networks

– Each network gets a weight based on training performance

• Avoids overfitting• But: very slow due to integration required to

determine the posterior

36Reinhard Schwienhorst, Michigan State University

Comparing multivariate methods

How optimal can anoptimal event analysis be?

37Reinhard Schwienhorst, Michigan State University

0 5 10 15 20 25 30 35 400

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

D0 single top Wbb

R LDA

SPR Boost w/bag

D0 DT

SPR LDA

SPR Boost Dec Splits

D0 NN (p17)

SPR RF

signal efficiency (%)

back

grou

nd e

ffici

ency

(%

)

Classifier comparison, DØ single top

Neural network

Random forest

Likelihood analysis

25 variables, 10000 eventsClassifiers are evaluated at fixed points, lines connect points for better visibility

38Reinhard Schwienhorst, Michigan State University

Babar Muon ID

Interesting region

25 variables, 10000 events

39Reinhard Schwienhorst, Michigan State University

Mini-Boone

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

MiniBoone Comparison

Random Forest

Roe (boosted DT)

Neal (Bayesian NN)

Signal Efficiency

Bac

kgro

und

Effi

cien

cy

Here 54 variables, 130000 events, training was also done with 300 variablesBDT uses 1000 trees

40Reinhard Schwienhorst, Michigan State University

Glast

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10.0000

0.0001

0.0010

0.0100

0.1000

1.0000

GLAST background rejection

R Random Forest R LDA R LR SPR BDT SPR BDT

SPR LDA SPR RF

signal efficiency

ba

ck

gro

un

d

Boosted decision tree

Random forest

Random forest

Likelihood methods

35 variables

41Reinhard Schwienhorst, Michigan State University

Summary

Signal efficiency

Bac

kgro

und

effi

cien

cy1

00 1

Random guess

Neural networks,simple decision trees, etc

Boosted decision trees,bayesian neural networks,randomforests

Cut-based or likelihood

Conclusions• Multivariate event analysis techniques are now a

common tool in HEP– In the past mostly neural networks, now also decision tree-

related methods• Glast, MiniBoone, Atlas, Dzero

• Modern classification tools make life easy– Very few parameters to adjust– Can use many variables

• Ranking of variables automatically provided

– Implemented in several software packages

Conclusions• Multivariate event analysis techniques are now a

common tool in HEP– In the past mostly neural networks, now also decision tree-

related methods• Glast, MiniBoone, Atlas, Dzero

• Modern classification tools make life easy– Very few parameters to adjust– Can use many variables

• Ranking of variables automatically provided

– Implemented in several software packages

 Advanced event analysisenables discoveries

Resources• PhyStat code repository

https://plone4.fnal.gov:4430/P0/phystat/

• PhyStat 2007 conferencehttp://phystat-lhc.web.cern.ch/phystat-lhc/

• Jim Linnemann's collection of statistics links:http://www.pa.msu.edu/people/linnemann/stat_resources.html

• Statistical analysis tool Rhttp://www.r-project.org/

• TMVA (multivariate analysis tools in root)http://tmva.sourceforge.net/

• Neural Networks in Hardwarehttp://neuralnets.web.cern.ch/NeuralNets/nnwInHep.html

• Boosted Decision Trees in MiniBoonehttp://arxiv.org/abs/physics/0508045

• Decision Tree Introductionhttp://www.statsoft.com/textbook/stcart.html

• GLAST Decision Treeshttp://scipp.ucsc.edu/~atwood/Talks%20Given/CPAforGLAST.ppt

top related