modernmachine learning and the large hadron...

53
Modern Machine Learning and the Large Hadron Collider Matthew Schwartz Harvard University KIAS QQUC School on "AI in High Energy Physics" July 24, 2020

Upload: others

Post on 02-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Modern Machine Learningand the Large Hadron Collider

Matthew SchwartzHarvard University

KIAS QQUC School on "AI in High Energy Physics" July 24, 2020

Page 2: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Outline• Introduction to jet physics• Traditional approach vs ML approach• ML applications in collider physics–Quarks vs. Gluons– Top tagging– Pileup removal– Unsupervised learning– Unfolding–Weak supervision– Decorrelation

Page 3: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

INTRODUCTION TO JETS

Page 4: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Higgs discovered 2012

PhotonsNot as clear bump

Small signal/background0.2% of Higgs decays

4 leptons = Golden ChannelClear bump

Large signal/background0.01% of Higgs decays

WWNo bump

Need backgrounds0.8% of Higgs decays

h ! e�e+µ�µ+ h ! e�µ+⌫⌫h ! ��, 0.01% , 0.1% , 1%

Page 5: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Higgs discovered 2012

PhotonsNot as clear bump

Small signal/background0.2% of Higgs decays

4 leptons = Golden ChannelClear bump

Large signal/background0.01% of Higgs decays

WWNo bump

Need backgrounds0.8% of Higgs decays

h ! e�e+µ�µ+ h ! e�µ+⌫⌫h ! ��, 0.01% , 0.1% , 1%

Page 6: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Higgs discovered 2012

PhotonsNot as clear bump

Small signal/background0.2% of Higgs decays

4 leptons = Golden ChannelClear bump

Large signal/background0.01% of Higgs decays

WWNo bump

Need backgrounds0.8% of Higgs decays

h ! e�e+µ�µ+ h ! e�µ+⌫⌫h ! ��, 0.01% , 0.1% , 1%

Page 7: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

HiggsDecays to

Higgs boson decay modes

QuarksW bosons

Gluons

60%

21%

9%

Photons (0.2%)

3%

Tauons

6%

Z bosons

Page 8: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

HiggsDecays to

Higgs boson decay modes

JetsW bosons

Gluons

60%

21%

9%

Photons (0.2%)

3%

Tauons

6%

Z bosons

Page 9: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

HiggsDecays to

Higgs boson decay modes

JetsW bosons

60%

21%

9%Jets

Photons (0.2%)

3%

Tauons

6%

Z bosons

Page 10: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

HiggsDecays to

Higgs boson decay modes

Jets

W bosons

69%21%

Photons (0.2%)

3%

Tauons

6%

Z bosons e,µ (0.8%)

Jets (20.2%)

Page 11: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

HiggsDecays to

Higgs boson decay modes

Jets89.2%

Photons (0.2%)

3%

Tauons

6%

Z bosons

e,µ (0.01%) Jets (2.99%)

Page 12: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

HiggsDecays to

Higgs boson decay modes

Jets92.2%

Photons (0.2%)

Tauons

6%

e,µ (1%)

Jets (5%)

Page 13: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

HiggsDecays to

Higgs boson decay modes

Jets98%

Photons (0.2%)

e, µ (1.8%)

• Higgs discovery involved just these special decays

Page 14: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Jets from come from chomoelectromagnetic radiation

Page 15: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Not all jets are created equal

B-jetsLight quark

jets

Gluon jets

60%23%

9%

Tau jets

6%

How can we tell all these different jet types apart?

Background is• 80 billion gluon jets • 10 billion light quark jets

Jets98%

e, µ (1.8%)

Photons (0.2%)

=

Page 16: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

MODERN MACHINE LEARNING

Page 17: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Modern Machine Learning for Particle Physics

Apply to data

Combine best observables• Boosted Decision Trees• Neural Networks

Simulations

Observables

Physical insight

Traditional approach Modern machine learningInnovative algorithm

Validate physics Validate algorithm

Page 18: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Game is about Hammers and Nails:

• Convolutional Neural Networks• Recurrent Neural Networks• Variational Auto-encorders• Latent Dirichlet Allocation• Reinforcement learning• Point cloud networks• Cluster networks

• Top tagging• W tagging• Quark/gluon discrimination• Pileup removal• b/c/s-tagging• Jet-energy scale calibration• Missing energy measurement• Jets in heavy ion collisions

Page 19: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Physics domain is distributionsIs this event• two quark jets• two gluon jets• two Higgs bosons?

• Any individual event has no “truth” identity • All that exists are the probability distributions for different truths

• data is a combination

1000-dimensional phase space(from 108 dimensional measurements!)

dPg(x) =dn�g

dp1 · · · dpn<latexit sha1_base64="+11db8igEvEHvdab1RHjUujSXlk=">AAACG3icdVDLSsNAFJ34rPUVdelmsAh1U5Iq6EYounFZwT6gqWEymaRDJ5MwMxFLyH+48VfcuFDEleDCv3HSVqivAxfOnHMvd+7xEkalsqwPY25+YXFpubRSXl1b39g0t7bbMk4FJi0cs1h0PSQJo5y0FFWMdBNBUOQx0vGG54XfuSFC0phfqVFC+hEKOQ0oRkpLrln3m25YvT2Ap9AJBMKZf53x3JE0jJAb5pkPE9eGDvZjJWHx4LlrVuyaNQa0fpEvqwKmaLrmm+PHOI0IV5ghKXu2lah+hoSimJG87KSSJAgPUUh6mnIUEdnPxrflcF8rPgxioYsrOFZnJzIUSTmKPN0ZITWQP71C/MvrpSo46WeUJ6kiHE8WBSmDKoZFUNCngmDFRpogLKj+K8QDpCNSOs7ybAj/k3a9Zh/W7MujSuNsGkcJ7II9UAU2OAYNcAGaoAUwuAMP4Ak8G/fGo/FivE5a54zpzA74BuP9E8PUoKQ=</latexit>

dPq(x) =dn�q

dp1 · · · dpn<latexit sha1_base64="IL5h8QTUMkLm6FOifDmQ8i/6YB8=">AAACG3icdVDLSsNAFJ3Ud31FXboZLIJuSlKrtgtBdOOygq2FJobJZNIOnUzizEQsIf/hxl9x40IRV4IL/8bpQ1DRAxfOnHMvc+/xE0alsqwPozA1PTM7N79QXFxaXlk119ZbMk4FJk0cs1i0fSQJo5w0FVWMtBNBUOQzcun3T4f+5Q0Rksb8Qg0S4kaoy2lIMVJa8sxK0PCud2534RF0QoFwFlxlPHck7UbIu86zACaeDR0cxErC4YPnnlmyyla1dlCvQU1G0KS+f1i3qtCeKCUwQcMz35wgxmlEuMIMSdmxrUS5GRKKYkbyopNKkiDcR13S0ZSjiEg3G92Ww22tBDCMhS6u4Ej9PpGhSMpB5OvOCKme/O0Nxb+8TqrCmptRnqSKcDz+KEwZVDEcBgUDKghWbKAJwoLqXSHuIR2R0nEWdQhfl8L/SatStvfK9nm1dHwyiWMebIItsANscAiOwRlogCbA4A48gCfwbNwbj8aL8TpuLRiTmQ3wA8b7J0cvoPw=</latexit>

dPdata(x) = ↵qdPq + ↵gdPg + · · ·<latexit sha1_base64="gsCwG4RePkDIIvhONT4/hFT8AI4=">AAACRHicdZBNSxxBEIZ7jEl0zccaj16aLIIhsMzo+rGHgOjF4wpZFXaWoaandrex58PuGnEZ5sflkh+Qm7/AiwdFcg327K6SBC1oeHmfKqr6DTMlDbnulTP3av71m7cLi7Wld+8/fKwvfzo2aa4FdkWqUn0agkElE+ySJIWnmUaIQ4Un4dlBxU8uUBuZJt9pnGE/hmEiB1IAWSuo96JOUPiEl1REQFCW65df+Dfug8pGEJzzwp/sKEKVY8kj3gnOS/71kQ+fuMZoiocTLKKUTFBvuE3X3W5v7XArvFZrs1WJrXZ7w+WeRVU12Kw6Qf2XH6UijzEhocCYnudm1C9AkxQKy5qfG8xAnMEQe1YmEKPpF5MDSr5mnYgPUm1fQnzi/j1RQGzMOA5tZww0Mv+zynyO9XIa7PYLmWQ5YSKmiwa54pTyKlEeSY2C1NgKEFraW7kYgQZBNveaDeHxp/xlcbzR9Dab3lGrsbc/i2OBrbLPbJ15bIftsUPWYV0m2A92zW7ZnfPTuXHund/T1jlnNrPC/innzwNXmrGk</latexit>

goal is to test/measure

Page 20: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

QUARKS VS GLUONS

Page 21: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Quark jets vs gluon jets: theoryProbability of quark radiating: Probability of gluon radiating:

• Gluons around twice as likely to radiate than quarks• Gluon jets are fatter, on average• Gluon jets are more massive, on average• Gluon jets have more particles, on average• …

Color charge = 3

Color charge = 8

0 5 10 15 20 25 30 35 40 450

2

4

6

8

10

components_jet0_ak05_charged_countQ 200 GeV

G 200 GeV

components_jet0_ak05_charged_countCharged particle count

quark

gluon

dPq ⇠ 3⇥ ↵s

4⇡(· · · )

<latexit sha1_base64="ECRPGVEuBRqjY+WGK/j7uatT/RM=">AAACGnicdVBNS8NAEN34WetX1aOXxSLUS0lsau2t6MVjBdsKTQmbzcYubj7cnQgl9Hd48a948aCIN/Hiv3HTVlDRBwOP92aYmeclgiswzQ9jbn5hcWm5sFJcXVvf2CxtbXdVnErKOjQWsbz0iGKCR6wDHAS7TCQjoSdYz7s+zf3eLZOKx9EFjBI2CMlVxANOCWjJLVl+273BjuIhrjnAQ6awE0hCM4eIZEhcNc5sJ+FjXMEO9WNQB26pbFZN86hZb2BNLNuu2TmpN5uHJra0laOMZmi7pTfHj2kasgioIEr1LTOBQUYkcCrYuOikiiWEXpMr1tc0IvqIQTZ5bYz3teLjIJa6IsAT9ftERkKlRqGnO0MCQ/Xby8W/vH4KwfEg41GSAovodFGQCgwxznPCPpeMghhpQqjk+lZMh0QnAzrNog7h61P8P+keVq1a1Tq3y62TWRwFtIv2UAVZqIFa6Ay1UQdRdIce0BN6Nu6NR+PFeJ22zhmzmR30A8b7J6VNoKU=</latexit>

3 colors of quark

dPg ⇠ 8⇥ ↵s

4⇡(· · · )

<latexit sha1_base64="LnWhidkDsDwe6axAyu6Y1tQWdNU=">AAACG3icdVBNS8NAEN34WetX1KOXxSLopSRt1PYmevFYwVahKWGz2bSLmw92J0IJ+R9e/CtePCjiSfDgv3FjK6jog4HHezPMzPNTwRVY1rsxMzs3v7BYWaour6yurZsbmz2VZJKyLk1EIq98opjgMesCB8GuUslI5At26V+flv7lDZOKJ/EFjFM2iMgw5iGnBLTkmY2g4w2xq3iEW9gFHjGF3VASmrtEpCPiqSJ33JQXeA+7NEhA7Xtmzapb1mH74AhrYjtO0ynJQbvdsLCtrRI1NEXHM1/dIKFZxGKggijVt60UBjmRwKlgRdXNFEsJvSZD1tc0JvqIQf75W4F3tRLgMJG6YsCf6veJnERKjSNfd0YERuq3V4p/ef0MwtYg53GaAYvpZFGYCQwJLoPCAZeMghhrQqjk+lZMR0QnAzrOqg7h61P8P+k16nazbp87teOTaRwVtI120B6y0RE6Rmeog7qIolt0jx7Rk3FnPBjPxsukdcaYzmyhHzDePgD676DK</latexit>

8 colors of gluon

Page 22: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

best group of 5

charged mult & girth

charged mult * girth

charged mult R=0.5

subjet mult Rsub=0.1

mass/Pt R=0.3

girth R=0.5

|pull| R=0.3

planar flow R=0.3

optimal kernel at 80%

1st subjet R=0.5

avg kT of Rsub=0.1

decluster kT Rsub=0.1

jet shape !(0.1)

Quark Jet Acceptance

Significance ImprovementSignificance Improvement

Significance

Improvement

Figure 23. Significance Improvement Curves for pT = 100 jets for selected variables. These curvesshow the significance improvement !S/

!!B as a function of !S.

– 32 –

Particle count

Top 5 combined with BDT

Jet width

Gallicchio, MDS (arXiv:1211.7038)

Traditional approachConsider lots of “motivated” variables• Jet width• # of particles• # of subjets• Jet “shape”• Jet mass• …

eQ

"Sp"B

eQeG

bett

er

Page 23: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Convolutional Neural Networksfor quark/gluon jet discrimination

preprocess

• Center• Crop • Normalize• Zero• Standardize

• Red = energy of charged particles• Green = energy of neutral particles• Blue = number of charged particles

NN inputs

Three input layers

Komiske, Metodiev, MDS (arXiv:1612.01551)

Page 24: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew SchwartzFigure 5: (top) ROC and (bottom) SIC curves of the FLD and the deep convolutional

network trained on (left) 200GeV and (right) 1000GeV Pythia jet images with and without

color compared to baseline jet observables and a BDT of the five jet observables.

in signal over background discrimination power in a collider physics application, and also

exhibits a nontrivial maximum (at some "q) which gives an unbiased measure of the relative

performance of di↵erent discriminants [6].

The ROC and SIC curves of the jet variables and the deep convolutional network on

200GeV and 1000GeV Pythia jets are shown in Figure 5. The quark jet classificiation

e�ciency at 50% quark jet classification e�ciency for each of the jet variables and the CNN

are listed in Table 1. To combine the jet variables into more sophisticated discriminants, a

boosted decision tree (BDT) is implemented with scikit-learn. The convolutional network

outperforms the traditional variables and matches or exceeds the performance of the BDT of

all of the jet variables. The performance of the networks trained on images with and without

color is shown in Figure 6.

– 13 –

✏Qp✏G

Deep grayscale NN

Figure 5: (top) ROC and (bottom) SIC curves of the FLD and the deep convolutional

network trained on (left) 200GeV and (right) 1000GeV Pythia jet images with and without

color compared to baseline jet observables and a BDT of the five jet observables.

in signal over background discrimination power in a collider physics application, and also

exhibits a nontrivial maximum (at some "q) which gives an unbiased measure of the relative

performance of di↵erent discriminants [6].

The ROC and SIC curves of the jet variables and the deep convolutional network on

200GeV and 1000GeV Pythia jets are shown in Figure 5. The quark jet classificiation

e�ciency at 50% quark jet classification e�ciency for each of the jet variables and the CNN

are listed in Table 1. To combine the jet variables into more sophisticated discriminants, a

boosted decision tree (BDT) is implemented with scikit-learn. The convolutional network

outperforms the traditional variables and matches or exceeds the performance of the BDT of

all of the jet variables. The performance of the networks trained on images with and without

color is shown in Figure 6.

– 13 –

Quark/Gluon CNN results

Deep Color NN

BDT of top 5 variables

Single variables

Komiske, Metodiev, MDS (arXiv:1612.01551)

bett

er

Page 25: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

TOP TAGGING

Page 26: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Top-taggingHypothetical new heavy particles often decay to top quarks:

Looks like 6 Jets

top W

Anti-top

W

b u

d

b

u

d

Page 27: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Anti-top

For heavy KK gluons (> 1000 GeV) tops are ultrarelativistic (boosted)

Now it looks like 2 jets!

Tops are often boosted

W dijets

tt (SM)

m = 1TeVE= 500

GeV

E=500GeV

top

W

Page 28: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Typical background jets

Typical top jets Large boost (PT = 1500 GeV)

subjets within a jet

Page 29: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Top-tagging1. Look for big jets (R = 1.2)

2. with subjets within the jet

3. Analyze the subjets• W mass peak, top mass peak, and helicity angle

W

Hopkins top-taggerKaplan et al. arXiv:0806.0848

es = 0.4eb = 0.006

two tops: background down by 20,000

Page 30: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Many ML hammers hit the top-tagging nail

Hopkins top-tagger

better

arXiv:1902.09914

• Factor of 10 better background rejectionthan traditional taggers

Page 31: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Apples-to-apples top-tagging comparison

• Uses same samples• 800k training, 200k test

• So good, limited by sample size• GoaT: 200k/1368 = 146 bg events survive

image based

4-vector

based

theory

inspired

particle

cloud

arXiv:1902.09914

Page 32: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Particle net uses point cloud approach • Respects permutation symmetry(other approaches include energy flow polynomials, arXiv:1712.07124)

■uses EdgeConv■ angular distance metric ■ k-nearest neighbors

Gouskos and Qu: (arXiv:1902.08570)

Page 33: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Signal efficiency

4-10

3-10

2-10

1-10

1

10

Back

grou

nd e

ffici

ency

(13 TeV)

CMSSimulation Preliminary

DeepAK8DeepAK8-MDImageTopImageTop-MD

32t + SDm + b32t + SDm

BESTHOTVR

-BDT (CA15)3N

Top quark vs QCD multijet| < 2.4truthh < 1500 GeV, |truth

T1000 < p

< 210 GeVAK8SD105 < m

< 210 GeVCA15SD110 < m

< 220 GeVHOTVR140 < m

CMS top-tagging: DeepAK8 ML algorithm

better

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Signal efficiency

4-10

3-10

2-10

1-10

1

Back

grou

nd e

fficie

ncy

(13 TeV)

CMSSimulation Preliminary

Particle (Kinematics)

Particle (w/o Flavour)

Particle (Full) + SV

Top quark vs QCD multijet| < 2.4truthh < 1500 GeV, |truth

T1000 < p

< 210 GeVAK8SD105 < m

4-momenta only

+charge, particle id

+track info + heavy fla

vor tags

• Inputs particles/tracks to ResNet archictecture

Page 34: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

ATLAS top-tagging : topoDNN

raw topoclusters

high level inputs

• inputs topoclusters to deep neural network • also finds better performance with modern machine learning

bette

r

Page 35: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

PILEUP REMOVAL

Page 36: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

I Look at correlations between truth and corrected jets.

I Shown below with 140 pileup events.

I Subtraction alone does not perform as well.

Primary Interaction only (Truth)0 200 400 600 800 1000 1200 1400 1600 1800 2000

pile

up〉

140

〈+

0200400600800

100012001400160018002000

40.0% correlated(GeV)Dijet Mass

Uncorrected

Primary Interaction only (Truth)0 200 400 600 800 1000 1200 1400 1600 1800 2000

pile

up +

Sub

tract

ion

〉14

0〈

+

0200400600800

100012001400160018002000

78.8% correlated(GeV)Dijet Mass

Area Subtraction

Matthew Low (UChicago) Jet Cleansing August 12, 2013 13 / 23

PileupPile-Up ⌘ Z0 ! µµ candidate event with NPV = 25 from 2012

S. Menke, MPP Munchen ⇣ Pile-Up in Jets in ATLAS ⌘ BOOST, 12-16. Aug 2013, Flagstaff, AZ 7

• LHC collides protons in bunches• 1011 protons/bunch• Up to 200 collisions per bunch crossing

• Tracking system can resolve primary collision from secondary ”pileup” collisions• Only charged particles can be seen this way

Pileup removalalgorithm

• Can we use machine learning to remove the pileup radiation?

Page 37: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Pileup Images

Eric M. Metodiev (MIT) PUMML July 19, 2017 12 / 23

Pileup removal as regression problem

Can measure1. Leading vertex charged particles2. Pileup charged particles3. Total neutral particles

Leading vertex neutral particles?

Page 38: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Convnets for Pileup RemovalPileup Images

Eric M. Metodiev (MIT) PUMML July 19, 2017 12 / 23

• Separate observable energy deposits into 3 images

Input to CNN and train

Komiske, Metodiev, Nachman, MDS (arXiv:1707.08600)

Page 39: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

PileUp Mitigation with Machine Learning (PUMML)

Leading Vertex with Pileup PUMML PUPPI SoftKiller

Figure 3: Depictions of three randomly chosen leading jets. Shown from left to right are

the neutral leading vertex particles, with pileup added, with PUMML applied, with PUPPI

applied, and with SoftKiller applied. From examining these events, it appears that PUMML

has learned an e↵ective pileup mitigation strategy.

• Energy Correlation Functions, ECF(�)N [41]: Specifically, we consider the logarithm

of the two- and three-point ECFs with � = 4.

Fig. 4 illustrates the distributions of several of these jet observables after applying the

di↵erent pileup subtraction methods. While these plots are standard, they do not give a per-

event indication of performance. A more useful comparison is to show the distributions of the

per-event percent error in reconstructing the true values of the observables, which are shown

in Fig. 5. To numerically explore the event-by-event e↵ectiveness, we can look at the Pearson

linear correlation coe�cient between the true and corrected values or the interquartile range

(IQR) of the percent errors. Table 1 summarizes the event-by-event correlation coe�cients

of the distributions shown in Fig. 4. Table 2 summarizes the IQRs of the distributions shown

in Fig. 5. PUMML outperforms the other pileup mitigation techniques on both of these

metrics, with improvements for jet substructure observables such as the jet mass and the

energy correlation functions.

It is important to verify that PUMML learns a pileup mitigation function which is not

overly sensitive to the NPU distribution of its training sample. Robustness to the NPU on

which it is trained would indicate that PUMML is learning a universal subtraction strategy.

– 9 –

Leading Vertex with Pileup PUMML PUPPI SoftKiller

Figure 3: Depictions of three randomly chosen leading jets. Shown from left to right are

the neutral leading vertex particles, with pileup added, with PUMML applied, with PUPPI

applied, and with SoftKiller applied. From examining these events, it appears that PUMML

has learned an e↵ective pileup mitigation strategy.

• Energy Correlation Functions, ECF(�)N [41]: Specifically, we consider the logarithm

of the two- and three-point ECFs with � = 4.

Fig. 4 illustrates the distributions of several of these jet observables after applying the

di↵erent pileup subtraction methods. While these plots are standard, they do not give a per-

event indication of performance. A more useful comparison is to show the distributions of the

per-event percent error in reconstructing the true values of the observables, which are shown

in Fig. 5. To numerically explore the event-by-event e↵ectiveness, we can look at the Pearson

linear correlation coe�cient between the true and corrected values or the interquartile range

(IQR) of the percent errors. Table 1 summarizes the event-by-event correlation coe�cients

of the distributions shown in Fig. 4. Table 2 summarizes the IQRs of the distributions shown

in Fig. 5. PUMML outperforms the other pileup mitigation techniques on both of these

metrics, with improvements for jet substructure observables such as the jet mass and the

energy correlation functions.

It is important to verify that PUMML learns a pileup mitigation function which is not

overly sensitive to the NPU distribution of its training sample. Robustness to the NPU on

which it is trained would indicate that PUMML is learning a universal subtraction strategy.

– 9 –

Truth PUMML

Truth

With 140 piluep events

PUMML

• Excellent leading vertex (truth) reconstruction

With 140 piluep events

• Excellent observable reconstruction • Excellent stability for variable piluep #

Komiske, Metodiev, Nachman, MDS (arXiv:1707.08600) Traditional approaches

Page 40: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

ATLAS uses conv nets to measure MET (in simulation)

Conv netimage regression

ATL-PHYS-PUB-2019-028

Page 41: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

PUPPIML: Graphnet approach

PUPPIML

PUMML

• good stability • comparable to PUMML

Martinez et al. (arXiv:1810.07988)

Page 42: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

UNSUPERVISED LEARNING

Page 43: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

JUNIPR

1 GeV

500 GeV

20 GeV

nodes labeledwith

• unsupervised approach: learn probability distribution for each sample• represent data as clustering tree• can be used to classify or generate dPq(x) =

dn�q

dp1 · · · dpn<latexit sha1_base64="IL5h8QTUMkLm6FOifDmQ8i/6YB8=">AAACG3icdVDLSsNAFJ3Ud31FXboZLIJuSlKrtgtBdOOygq2FJobJZNIOnUzizEQsIf/hxl9x40IRV4IL/8bpQ1DRAxfOnHMvc+/xE0alsqwPozA1PTM7N79QXFxaXlk119ZbMk4FJk0cs1i0fSQJo5w0FVWMtBNBUOQzcun3T4f+5Q0Rksb8Qg0S4kaoy2lIMVJa8sxK0PCud2534RF0QoFwFlxlPHck7UbIu86zACaeDR0cxErC4YPnnlmyyla1dlCvQU1G0KS+f1i3qtCeKCUwQcMz35wgxmlEuMIMSdmxrUS5GRKKYkbyopNKkiDcR13S0ZSjiEg3G92Ww22tBDCMhS6u4Ej9PpGhSMpB5OvOCKme/O0Nxb+8TqrCmptRnqSKcDz+KEwZVDEcBgUDKghWbKAJwoLqXSHuIR2R0nEWdQhfl8L/SatStvfK9nm1dHwyiWMebIItsANscAiOwRlogCbA4A48gCfwbNwbj8aL8TpuLRiTmQ3wA8b7J0cvoPw=</latexit>

Probability written as product

• each term interpretable

Andreassen, Feige, Frye, MDS (arXiv:1804.09720, 1906.10137)

Page 44: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Visualizing discrimination power: boosted top or QCD jet?• nodes labeled with probabilities that jet is

top, given only info at that node

JUNIPR

Page 45: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

What is different?

quark vs glue

pythia vs herwig

• Length of shower important• angular distribution early on• energy sharing different early on

JUNIPR

Page 46: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

DCTR: Use relative weights for tuning

• includes simulation parameters q in truth data

• learns relative weight P(x, q)/P(x,q0)

Reweights as=0.1365 distribution back to as=0.1600

c2 is minimal for as=0.1600

• Could be a very efficient way to tune simulations• or to reweight simulations to data

Andreassen and Nachman (arXiv:1907.08209)

Page 47: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

OmniFold: Unfold events

• Uses ML to learn mapping from generator to detector

• Can then unfold any distribution to truth level• Previous unfolding techniques are

observable-by-observable

• Examples given used • Herwig as “truth” + delphes -> “data” • pythia as “sim” + delphes -> “gen”• Truth and omnifold agree

• Trying out on actual data is work in progess

Andreassen et al. (arXiv:1911.09107)

Page 48: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Weak supervisionSupervised learning: pure samples of quark and gluons usedWeak learning: mixed samples of quarks or gluons usedUnsupervised learning: no labels at all, just find patterns

results that mention quark/gluon tagging, but there many more analyses that would benefit from atagger if a robust technique existed.

The weakly supervised classification strategy is particularly useful for quark/gluon tagging becausethe fraction of quark jets for a particular set of events is well-known from parton distribution functionsand matrix element calculations while useful discriminating features have not been computed to highaccuracy and simulations often mis-model the data. To illustrate this concrete example, quark andgluon jets are simulated and a weakly supervised classifier is trained on the generated event sample.Unlike real data, in the simulated sample, we also know per-event labels which are used to additionallytrain a fully supervised classifier. Events with 2 ! 2 quark-gluon scattering (dijet events) are simulatedusing the Pythia 8.18 [16] event generator. Jets are clustered using the anti-kt algorithm [17] withdistance parameter R = 0.4 via the FastJet 3.1.3 [18] package. Jets are classified as quark- or gluon-initiated by considering the type of the highest energy quark or gluon in the full generator eventrecord that is inside a 0.3 radius of the jet axis. For simplicity, one transverse momentum range isconsidered: 45 GeV < pT < 55 GeV. Additionally, there is a pseudo-rapidity requirement that mimicsthe usual detector acceptance for charged particle tracking: |⌘| < 2.1. Heuristically, gluons have twiceas much strong-force charge as quark jets, resulting in more constituents and a broader radiationpattern. Therefore, the following variables are useful for quark/gluon discrimination: the number ofjet constituents n, the first radial moment in pT (jet width) w, and the fraction of the jet pT carriedby the leading anti-kT R = 0.1 subjet f0. The constituents i considered for computing n and w arethe hadrons in the jet with pT > 500 MeV.

(a) (b)

Figure 3: Comparison of ROC curves for quark/gluon jet discrimination using a fully supervised clas-sifier or a weakly supervised classifier. In (a) the fully and weakly supervised classifiers are trained onidentical simulated data and evaluated on a test sample drawn from the same population. The weaklysupervised classifier matches the performance of the fully supervised one. The curves correspondingto the three input observables used as discriminant are shown as reference. In (b), the fully supervisedclassifier (blue line) is trained on a labeled simulated training sample. The weakly supervised classifier(red line) is trained on an unlabeled pseudo-data training sample. In both cases, the performance isevaluated on the same pseudo-data test sample. The ratios to the performance of a fully supervisedclassifier trained on a labeled pseudo-data sample are shown in the bottom pad.

A weakly supervised classifier with one hidden layer of size 30 is trained by considering 12 binsof the distribution of the absolute di↵erence in pseudorapidity between the two jets [19]. The propor-

– 5 –

MultiplicityJet widthHard subjet

• Weak learning workswith three Q/G discriminants

Weakly supervised works as well asfully supervised

better

Dery at al. (arXiv:1702.00414)

Page 49: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Jet images + weak supervision

• Works as well as with full supervision• Labels not needed even for complex inputs

MDS, Komiske, Metodiev, Nachman (arXiv:1801.10158)

Fully supervised

Weak supervision(Mixed samples)

Weak supervision is a breakthrough for particle physics: Can learn complex discrimination directly from data

bett

er

Page 50: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

DECORRELATION

Page 51: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Adversarial networksLouppe, Kagan, Cranmer (arXiv:1611.01046)Shimmin et al. (arXiv:1703.03507)

on the jet pT, which shows some small pT-dependente↵ects, but no large features. As an alternativestrategy, we trained a network using an adversar-ial strategy with respect to log(m/pT), which moreclosely mimics the approach used in Ref. [9]; thetraining succeeded in finding a network with a flatresponse in log(m/pT), but the distortion in jet masswas much more significant. In principle, it is possi-ble to use the adversary to enforce a two-dimensionaldecorrelation, but since the pT-dependence is not se-vere here, we leave this for future study.

Signal Efficiency0 0.5 1

BG R

ejec

tion

1

10

210

310Traditional NNAdv. Trained NN

21''τ21'τ21τ

FIG. 4. Signal e�ciency and background rejection(1/e�ciency) for varying thresholds on the outputs ofseveral jet-tagging discriminants: traditional networkstrained to optimize classification, networks trained withan adversarial strategy to optimize classification whileminimizing impact on jet mass, the unmodified ⌧21, andthe two DDT-modified variables ⌧ 0

21, and ⌧ 0021. The signal

samples have mZ0 = 100 GeV for this example. Gener-alization to other masses is shown in Sec. VII.

V. STATISTICAL INTERPRETATION

The ability to discriminate jets due the hadronicdecay of a boosted object from those due to a quarkor gluon is an important feature of a jet substruc-ture tagging tool, but as discussed above it is not theonly requirement. Due to the necessity of accuratelymodeling the background, it is desirable that the jettagger avoid distortion of the background distribu-tion. Simpler background shapes are especially pre-ferred because they allow for robust estimates that

Jet Invariant Mass [GeV]0 50 100 150 200 250

NN O

utpu

t

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Traditional NNAdv. Trained NN

Jet Invariant Mass [GeV]0 50 100 150 200 250

N -

subj

etin

ess

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

21''τ

21'τ21τ

Jet Invariant Mass [GeV]0 50 100 150 200 250

NN O

utpu

t

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 > 150 GeVTJet P > 200 GeVTJet P > 250 GeVTJet P > 300 GeVTJet P

Jet Invariant Mass [GeV]10 20 30 40 50 60 70 80

NN o

utpu

t

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FIG. 5. Top left, relationship between jet mass and neu-ral network output in background events for a networktrained to optimize classification compared to an adver-sarial network trained to optimize classification whileminimizing dependence on jet mass. Top right, rela-tionship between jet mass and jet substructure variable⌧21 and the DDT-modified ⌧ 0

21 and ⌧ 0021 which attempt

to minimize dependence on jet mass. Bottom left, pro-file of neural network output versus jet mass for the ad-versarial trained network with varying jet pT thresholds.Bottom right, contour plot of neural network output ver-sus jet mass in background events for the adversarially-trained network. The signal sample used in training hasmZ0 = 100 GeV; generalization to other masses is shownin Sec. VII.

are constrained by the sidebands; backgrounds thatcan be modeled with fewer parameters and inflec-tions avoid degeneracy with signal features, such asa peak.

Fig. 5 shows qualitatively that the adversarial net-work’s response is not strongly dependent on jetmass. But a quantitative assessment is more dif-ficult. Mass-independence is not in itself the goal;instead, we seek reduced dependence on knowledgeof the background shape and reduced sensitivity tothe systematic uncertainties that tend to dilute thestatistical significance of a discovery.

However, our lack of knowledge of the true back-ground model in general also makes it non-trivial torigorously define and estimate the background un-certainty. In practice, experimentalists use an as-sumed functional form, with parameters constrainedby background-dominated sidebands to predict the

5

• Train network on signal and background• Can sculpt background to look like signal

No longer trust bump Discovery significance reduced

Adversarial network

Tries to separatesignal from background

Tries to guess mass from network 1 output

Network 1 Network 2

Jet Invariant Mass [GeV]0 50 100 150 200

NN

out

put

0

0.5

1

Z' mass=20 GeV

Z' mass=35 GeV

Z' mass=50 GeV

Z' mass=100 GeV

Z' mass=200 GeV

Z' mass=300 GeV

Adv. Trained NN

Jet Invariant Mass [GeV]0 50 100 150 200

NN

out

put

0

0.5

1

Z' mass=20 GeV

Z' mass=35 GeV

Z' mass=50 GeV

Z' mass=100 GeV

Z' mass=200 GeV

Z' mass=300 GeV

Trad. NN

FIG. 11. Profile of the paramterized NN responsesto background versus jet mass, where the parameterizednetwork was evaluated at di↵erent Z0 mass hypotheses.Top shows the response of the adversarially-trained clas-sifier, which minimizes correlation with jet mass; bottomshows the response of a network trained in the traditionalmanner, to optimize classification accuracy.

able of interest, the jet mass. This allows the classi-fier to enhance signal to noise ratio while minimiz-ing the tendency of the background distribution tomorph into a shape which is degenerate with the ob-servable signal. When the background cannot be re-liably predicted a priori, as is often the case, it is im-portant to be able to constrain its rate in sidebandssurrounding the signal region. Therefore, avoidingsuch degeneracy is critical to performing successfulmeasurements.

We note that, from Fig. 8, it is clear that ap-plying su�ciently tight cuts to the adversarial clas-sifier causes significant background morphing, par-ticularly when compared to the ⌧21-based discrimi-nants. However, the solid lines of Fig. 9 illustratethe case where the background rate is uncertainand hence benefits from sideband constraints. Wesee that the optimal significance is realized for theadversarial classifier at a relatively high signal e�-ciency of roughly 90%, where the background mor-

Z' Mass [GeV]0 100 200 300

AUC

0.2

0.4

0.6

0.8

1

1.2Trad. NNAdv. Trained NN

21''τ

21'τ21τ

FIG. 12. The AUC metric (Area Under the Curve) forNNs parameterized in mZ0 and tested at several values(both traditional and adversarial training techniques),compared to the discrimination of the individual features⌧21, ⌧

021, and ⌧ 00

21.

Z' Mass [GeV]0 100 200 300

]σBe

st D

isco

very

sig

nific

ance

[

2

4

6

8Adv. Trained NN

21 ,,τ

21 ,τ

21τ

Trad. NNNo cut

FIG. 13. Discovery significance for a hypothetical sig-nal after optimizing thresholds on the output of networksparameterized in mZ0 trained with an adversarial or tra-ditional approaches, compared to thresholds on ⌧21, ⌧

021

and ⌧ 0021 or to placing no threshold. Significance is eval-

uated for the case of 50% background uncertainty.

phing is quite limited (Fig. 7). Hence, the adversar-ial classifier achieves its goal of optimizing the trade-o↵ between correlation and discrimination power.We also note that the decorrelation could poten-

9

Decorrlates NN outputfrom mass

Discovery SignificanceImproved!

Penalty in loss function 51

Page 52: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

ABCDisCo: ML the ABCD methodKaciecka, Nachman, MDS, Shih (to appear..)

ABCD method:• Standard experimental sideband technique• Estimate background in region A via 𝑁! =

"!"""#

• Requires two features 𝑓 and 𝑔 to be uncorrelated• E.g. 𝑓 = mass and 𝑔 = rapidity

• Single DisCo

𝑓 is fixed (e.g. mass) 𝑔 is learned

• Double DisCo

𝑓 and 𝑔 are learned

Top tagging RPV squark search

Works great!

Page 53: ModernMachine Learning and the Large Hadron Colliderevents.kias.re.kr/ckfinder/userfiles/202007/files/Matthew...ModernMachine Learning and the Large Hadron Collider Matthew Schwartz

Matthew Schwartz

Conclusions• Modern machine learning is growing rapidly• “Traditional” collider physics is dead

• Much progress on standard nails • top-tagging• quark/gluon• anomaly detection

• State-of-the-art methodology• New observables (ML top tagging…)• New data representations (JUNIPR, point clouds …)• Improving experimental analyses (Omnifold, ABCDisCo..)

• See https://iml-wg.github.io/HEPML-LivingReview for a comprehensive list of papers

• Past: apply hammers to nails • Future: learn some new physics