event analysis seminar - schwier/talks/seminars/marseille2007advancedmethods.pdf · event counting...

Advanced event analysis methods at the energy frontier

Reinhard Schwienhorst

CPPM Seminar, July 5 2007

2Reinhard Schwienhorst, Michigan State University

Outline• Introduction

– Physics at the energy frontier

• Analysis procedure• Event analysis methods

– Decision trees• Boosting• Random forest

– Bayesian neural networks– Classifier comparisons

• ConclusionsDisclaimer: highlight general principles and guiding ideas

• Not necessarily mathematically rigorous• Some of the same topics were addressed at PhyStat conferences

Typical event analysis procedures

1) Cut-based event counting2) Peak in a characteristic distribution

Event counting• Apply cuts to variables

describing the event– Object identification– Kinematic cuts on objects– Event kinematics

• Goal: cut until the signalis visible– No background left

– Or large S/√B

• Sensitive to any signal with this final state

• Requires understanding of background

example: Z discovery at UA12 EM clusters,ET > 25 GeV

1 EM clustertrack-matched

both EM clusterstrack-matched

Peak in a characteristic distribution• Find a variable that has a smooth

distribution for background – Typically invariant mass

• Measure this distribution over a large range of possible values

• Look for possible resonance peaks • Sensitive to any resonance with

this final state• Background estimate for sidebands

“Bump Hunting”

Example: b-quark discovery at Fermilab

The energy frontier• Colliding particles at the highest available energies

• Probe structure of matter at the most fundamental level

– Observe interactions at the smallest possible distances– Produce never-before-seen particles

Future: LHCFuture: LHCPresent: TevatronPresent: Tevatron

Searches at the energy frontier• Tevatron:

– Single top quark production

Low DTregion

High DTregion

Searches at the energy frontier• Searches for new particles, phenomena, couplings

– Tevatron:• Single top quark production• Higgs boson search• SUSY• Extra dim• ...• Statistics-limited

Tevatron Higgs Sensitivity

Multivariate techniques make this level of sensitivity possible

– Tevatron:• Single top quark production• Higgs boson search• SUSY• Extra dim• ...

– LHC:• Higgs searches

SM background

LHC Higgs Sensitivity

Multivariate techniqueswill be required to reachthis level of sensitivity

– Tevatron:• Single top quark production• Higgs boson search• SUSY• Extra dim• ...

– LHC:• Higgs searches• SUSY• Extra dim• ...•

LHC SUSY signatureMeff = ET + ∑ pT (jets)

SM background

Susy at 1TeV

Measurements at the energy frontier– First measurements of properties, couplings

• With samples of limited size• Example: Top quark mass

– ~ 3 GeV with 1 fb-1 – Was the goal for 2 fb-1!

lepton+jets top mass, matrix element method

Measurements at the energy frontier• First measurements of properties, couplings

– With samples of limited size– Example:

LHC SUSY particlemasses

LHC b mass: 100 signal events in 30 fb-1

M(llbbET) [GeV]/

Physics at the energy frontier• Searches for new particles, phenomena, couplings• First measurements of properties, couplings

• Multivariate techniques Adding more data

Making the most out of small samples of events

How to improve upon

Event countingand

Bump hunting

Bayesian limit• For each analysis, there exists a fully optimized

signal-background separation– Target function, also called Bayes discriminant or Bayesian

• For a single discriminating variable, this ratio of signal and background likelihoods is easy to calculate– Monte Carlo procedure:

• Generate signal and background MC events• Fill histograms for signal and background• Divide the two histograms

B(x) = _____L(S|x)

L(B|x)

Bayesian limit• For each analysis, there exists a fully optimized

signal-background separation– Target function, also called Bayes discriminant or Bayesian

• For a single discriminating variable, this ratio of signal and background likelihoods is easy to calculate

• In case of more than one variables, this isn't possible anymore– Not enough MC statistic to compute a

many-dimensional likelihood

B(x) = _____L(S|x)

L(B|x)

Curse of dimensionality

Optimized event analysis

• Requires detailed understanding of signal and background– Only applicable to searches for a specific signal or

measurements of a specific process

Optimized =

Optimize signal-background separationExploit full event information

Event kinematics, angular correlations, ...Take all correlations into account

Goal: Reach the Bayesian limit

Optimized event analysis

• Requires detailed understanding of signal and background– Only applicable to searches for a specific signal or

measurements of a specific process• Limited by background and signal modeling

– MC statistics, MC model, background composition, shape, ...

Optimized =

Optimize signal-background separationExploit full event information

Event kinematics, angular correlations, ...Take all correlations into account

If signal model is wrong: search is not sensitive

If background model is wrong: find something that isn't there

Goal: Reach the Bayesian limit

Event analysis techniques

Many others: Kernel methods, support vector machines, ...

Bayesian neural networksBoosted decision trees,random forest

Matrix Elements

Cut-Based Neural networks Decision trees Likelihood

Cutbased

decision trees Matrix Elements

Likelihood

Bayesian neural networks

Neural networks

Cut-based analysis

Lepton pT>41 GeV

Event Energy<65GeV

M(top) <352 GeV

Final Event Set

• Estimate background yield• Compare to data

Nobs = Ndata – NB

• Calculate signal acceptanceσ = Nobs / (A*L)

In the final event set

Including events that fail a cut

pTl>41

ptl>65

– Create a tree of cuts– Divide sample into

“pass” and “fail” sets – Each node corresponds

to a cut (branch)

Trees and leafs

to a cut (branch)– A leaf corresponds to an

end-point– For each leaf, calculate purity

(from MC):purity = NS/(NS+NB)

pTl>41

ptl>65

Decision tree

to a cut (branch)– A leaf corresponds to an

end-point– For each leaf, calculate purity

(from MC):purity = NS/(NS+NB)

– Train the tree by optimizing the Gini improvement:

• Gini = 2 NS NB /(NS + NB)

• Each leaf will be either background- or signal-enhanced

pTl>41

ptl>65

Decision tree output• Train on signal and background models (MC)

– Stop and create leaf when NMC<100

• Compute purity value for each leaf• Send data events through tree

– Assign purity value corresponding to the leaf to the event

• Result approximates a probability density distribution

Boosting

Boosting• A general method to improve the

performance of any weak qualifier– Decision trees, neural networks, ...

• Linear combination of many filter functions

– ak: coefficient, typically result of minimization of error function

F(x) = ∑ ak fk(x) k

Boosting procedure

Train filter function Train filter function ffkk

Find coefficient aFind coefficient akk minimize error functionminimize error function

Initial training sample TInitial training sample Tk=1k=1

Modify training sample TModify training sample Tkk

F = ∑ ak fk k

Adaptive boosting

• In each iteration, update coefficient ak – From minimizing error function– coefficients decrease at each iteration

• Update weight for each event in training sample Tk – Figure out which events have been misclassified

• Signal events should have purity ≥ 0.5

• Background should have purity <0.5

– Increase event weight for those events that have been misclassified

Boosting performance

DØ single top searchwith decision trees

3 different sets ofdiscriminating variables

Random forest• Average over many

decision trees– Typically O(100)

• Each tree is grown usingm variables– For N total variables, m<<N

• Very fast algorithm– Even with large number of variables

• Very few parameters to adjust– Typically only m

Bayesian neural networksBoosted decision trees,random forest

Matrix Elements

Cut-Based Neural networks Decision trees Likelihood

Neural networks

f(x) = Σ w'k nk(x,wk)

Output Node: linear combination of hidden nodes

Input Nodes: One for each variable xi

nk(x,wk) =1

1 + e- Σ wik xi

Sigmoid Hidden Nodes: Each is a sigmoid dependent on the input variables

Neural Network Training

– Find optimum NN parameters on training signal/background events

– Apply NN to independent set of signal and background• Testing sample

– Stop training when error from testing sample starts increasing• Overfitting

DØ single top search

Bayesian neural networks• Bayesian idea:

– Rather than finding one value for each weight,determine the posterior probability for each weight

• Form many networks by sampling from the posterior• Typical case: ~100 individual neural networks

– Each network gets a weight based on training performance

• Avoids overfitting• But: very slow due to integration required to

determine the posterior

Comparing multivariate methods

How optimal can anoptimal event analysis be?

0 5 10 15 20 25 30 35 400

D0 single top Wbb

SPR Boost w/bag

SPR LDA

SPR Boost Dec Splits

D0 NN (p17)

SPR RF

signal efficiency (%)

Classifier comparison, DØ single top

Neural network

Random forest

Likelihood analysis

25 variables, 10000 eventsClassifiers are evaluated at fixed points, lines connect points for better visibility

Babar Muon ID

Interesting region

25 variables, 10000 events

Mini-Boone

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.000

MiniBoone Comparison

Random Forest

Roe (boosted DT)

Neal (Bayesian NN)

Signal Efficiency

Here 54 variables, 130000 events, training was also done with 300 variablesBDT uses 1000 trees

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10.0000

0.0001

0.0010

0.0100

0.1000

1.0000

GLAST background rejection

R Random Forest R LDA R LR SPR BDT SPR BDT

SPR LDA SPR RF

signal efficiency

Boosted decision tree

Random forest

Likelihood methods

35 variables

Summary

Signal efficiency

Random guess

Neural networks,simple decision trees, etc

Boosted decision trees,bayesian neural networks,randomforests

Cut-based or likelihood

Conclusions• Multivariate event analysis techniques are now a

common tool in HEP– In the past mostly neural networks, now also decision tree-

related methods• Glast, MiniBoone, Atlas, Dzero

• Modern classification tools make life easy– Very few parameters to adjust– Can use many variables

• Ranking of variables automatically provided

– Implemented in several software packages

Conclusions• Multivariate event analysis techniques are now a

common tool in HEP– In the past mostly neural networks, now also decision tree-

related methods• Glast, MiniBoone, Atlas, Dzero

• Modern classification tools make life easy– Very few parameters to adjust– Can use many variables

• Ranking of variables automatically provided

– Implemented in several software packages

Advanced event analysisenables discoveries

Resources• PhyStat code repository

https://plone4.fnal.gov:4430/P0/phystat/

• PhyStat 2007 conferencehttp://phystat-lhc.web.cern.ch/phystat-lhc/

• Jim Linnemann's collection of statistics links:http://www.pa.msu.edu/people/linnemann/stat_resources.html

• Statistical analysis tool Rhttp://www.r-project.org/

• TMVA (multivariate analysis tools in root)http://tmva.sourceforge.net/

• Neural Networks in Hardwarehttp://neuralnets.web.cern.ch/NeuralNets/nnwInHep.html

• Boosted Decision Trees in MiniBoonehttp://arxiv.org/abs/physics/0508045

• Decision Tree Introductionhttp://www.statsoft.com/textbook/stcart.html

• GLAST Decision Treeshttp://scipp.ucsc.edu/~atwood/Talks%20Given/CPAforGLAST.ppt

event analysis seminar - schwier/talks/seminars/marseille2007advancedmethods.pdf · event counting...

Documents

kinematic and motion

“e” products mobile data, erouting, eservice jim snyder,...

engagement, satisfaction and time management skill of online...

chap 5 kinematic linkages animation (u), chap 5 kinematic...

kinematic of particle

kinematic design

kinematic cellularautomata

dc motores - kinematic

2.4 kinematic equations

forward kinematic problem

kinematic synthesis

mcad2sim: towards automatic kinematic joints...

penn state ae senior capstone project garrett schwier | ...

document resume author misanchuk, earl r.; schwier, … ·...

kinematic of adapthree

inverse kinematic solvers

church & dwight co., inc. brad schwier acg2021-002 brad...

kinematic inversions

tc kinematic

technology in social studies instruction “integrating...