the future is open · jesse thaler — the future is open: jet substructure with cms public data 3...
TRANSCRIPT
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data
CMS Week, CERN — June 25, 2018
1
Jesse Thaler
The Future is OpenJet Substructure with CMS Public Data
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 2
CMS and CERN are pioneering the release of research-grade public collider data
2014 2015 2016 2017 2018 2019 2020 2021
2011A 2012B/C2010B
→
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 3
Exposing the QCD Splitting Function with CMS Open Data
Andrew Larkoski,1,*
Simone Marzani,2,†
Jesse Thaler,3,‡
Aashish Tripathee,3,§
and Wei Xue3,∥
1Physics Department, Reed College, Portland, Oregon 97202, USA
PRL 119, 132003 (2017) P HY S I CA L R EV I EW LE T T ER Sweek ending
29 SEPTEMBER 2017
2014 2015 2016 2017 2018 2019 2020 2021
First Publication
MODMIT Open Data
2011A 2012B/C2010B
→
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 4
Exposing the QCD Splitting Function with CMS Open Data
Andrew Larkoski,1,*
Simone Marzani,2,†
Jesse Thaler,3,‡
Aashish Tripathee,3,§
and Wei Xue3,∥
1Physics Department, Reed College, Portland, Oregon 97202, USA
PRL 119, 132003 (2017) P HY S I CA L R EV I EW LE T T ER Sweek ending
29 SEPTEMBER 2017
2014 2015 2016 2017 2018 2019 2020 2021
First Publication
MODMIT Open Data
2011A 2012B/C2010B
A Milestone for Public Collider Data
A Milestone for Jet Physics
An Opportunity/Challenge for our Community
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 5
Viability of public collider datadepends on interest/enthusiasmof particle physics community
Highlight the opportunitiesExpose the challengesInspire you to help
Goals of this talk:
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 6
The Future of Public Collider Data
Jet Substructure and QCD Splittings
Using the CMS Open Data
0
1
2
3
4
5
6
7
8
1
σdσdzg
CMS 2010 Open Data
Theory (MLL; all)
Pythia 8.219
Herwig 7.0.3
Sherpa 2.2.1
pPFCT > 1.0 GeV
AK5; |η| < 2.4
pjetT > 150 GeV
SD: β = 0, zcut = 0.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Track zg
0.5
1.0
1.5
2.0
Rati
oto
Pyth
ia
Outline
Recent progress processing the 2011 dataset
Highlights from our 2010 publications
Back to the future with ALEPH data
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 7
The Future of Public Collider Data
Jet Substructure and QCD Splittings
Using the CMS Open Data
0
1
2
3
4
5
6
7
8
1
σdσdzg
CMS 2010 Open Data
Theory (MLL; all)
Pythia 8.219
Herwig 7.0.3
Sherpa 2.2.1
pPFCT > 1.0 GeV
AK5; |η| < 2.4
pjetT > 150 GeV
SD: β = 0, zcut = 0.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Track zg
0.5
1.0
1.5
2.0
Rati
oto
Pyth
ia
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 8
November 2014:!
Run 2010B7 TeV, 32 pb–1
!
>20 TB, no MC!
(First publication: QCD)
April 2016:!
Run 2011A7 TeV, 2.5 fb–1
!
>100 TB, with MC!
(In the pipeline: BSM, ML)
opendata.cern.ch/research/CMS
December 2017:!
Run 2012B/C8 TeV, 11.6 fb–1
!
>1 PB, with MC
Kati Lassila-Perini,Achim Geiser, ...
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 9
The MIT Open Data Team
2010QCD:
Aashish Tripathee Wei Xue Simone MarzaniAndrew Larkoski Summer intern:Alexis Romero
CMS advice:Sal Rappoccio
2011BSM:
Wei XueMatt Strassler Yotam Soreq Cari Cesarotti Raffaele D'Agnolo
2011ML:
Radha Mastandrea Preksha Naik …
Today:very preliminary 2011 results(only 16%, still debugging)
Patrick Komiske
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 10
Key Challenge: Initial Data Processing
CernVM + CMSSW 5.3.32 (for 2011)!
AOD Format (CMS Root)RAW → RECO → “Analysis Object Data”!
Access via XRootD, write custom EDAnalyzer
For novices, very steep learning curve for using Rootand understanding overall data structure (esp. triggers)
Jet Primary Dataset: 4.7 TB for 3 million AOD eventsDaunting amount of information!
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 11
Our Strategy: Simplified Analysis Framework
MOD
MODProducer + MODAnalyzer + FastJet 3.3.1!
MOD Format (ASCII)For 2010: Cross-check with flat Root n-tuples!
Access via External Hard Drive
Text files as educational tool and debugging strategyAccess only essential information, sacrifice flexibility
Jet Primary Dataset: ~500 GB for 3 million MOD eventsNew for 2011: Separate processing for triggers/luminosity
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 12
Example MOD Metadata (Simplified)
New for 2011: Effective luminosity information per trigger
BeginFile Version 6 CMS_2011A Data Jet !# File Filename TotalEvents ValidEvents IntLumiDel IntLumiRec File A850-02163E008D77 30284 29057 895036.1 884409.2 !#Block RunNum LumiBlock Events Valid? IntLumiDel IntLumiRec Block 160578 366 53 1 28.6 27.5 Block 160578 367 58 1 28.6 27.5 Block 160578 368 38 1 28.6 27.5 Block 160578 369 40 1 28.6 27.5 Block 160578 370 57 1 28.6 27.5 Block 160578 371 46 1 28.6 27.5 Block 160578 372 37 1 28.6 27.5 ... !# Trig Name Present Valid Fired EffLumiDel EffLumiRec AvePrescale Trig HLT_Jet240_v2 16530 9 1 27228.1 27026.7 3.0 Trig HLT_DiJetAve140U_v4 13754 5 1 11705.3 11582.4 1.0 Trig HLT_Jet240_v1 13754 5 1 18387.4 18205.5 1.0 Trig HLT_Jet150_v2 16530 9 2 2722.8 2702.6 30.0 Trig HLT_Jet190_v2 16530 9 2 8168.4 8108.0 10.0 Trig HLT_Jet190_v1 13754 5 2 6939.9 6872.4 3.0 Trig HLT_DiJetAve15U_v4 13754 5 1 1.0 1.0 7500.0 Trig HLT_DiJetAve110_v2 11772 3 1 358.3 356.0 75.0 ... !EndFile
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 13
Example MOD Metadata (Simplified)
New for 2011: Effective luminosity information per trigger
BeginFile Version 6 CMS_2011A Data Jet !# File Filename TotalEvents ValidEvents IntLumiDel IntLumiRec File A850-02163E008D77 30284 29057 895036.1 884409.2 !#Block RunNum LumiBlock Events Valid? IntLumiDel IntLumiRec Block 160578 366 53 1 28.6 27.5 Block 160578 367 58 1 28.6 27.5 Block 160578 368 38 1 28.6 27.5 Block 160578 369 40 1 28.6 27.5 Block 160578 370 57 1 28.6 27.5 Block 160578 371 46 1 28.6 27.5 Block 160578 372 37 1 28.6 27.5 ... !# Trig Name Present Valid Fired EffLumiDel EffLumiRec AvePrescale Trig HLT_Jet240_v2 16530 9 1 27228.1 27026.7 3.0 Trig HLT_DiJetAve140U_v4 13754 5 1 11705.3 11582.4 1.0 Trig HLT_Jet240_v1 13754 5 1 18387.4 18205.5 1.0 Trig HLT_Jet150_v2 16530 9 2 2722.8 2702.6 30.0 Trig HLT_Jet190_v2 16530 9 2 8168.4 8108.0 10.0 Trig HLT_Jet190_v1 13754 5 2 6939.9 6872.4 3.0 Trig HLT_DiJetAve15U_v4 13754 5 1 1.0 1.0 7500.0 Trig HLT_DiJetAve110_v2 11772 3 1 358.3 356.0 75.0 ... !EndFile
Effective Luminosity per Trigger Testing Trigger Consistency
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 14
Example MOD Event (Simplified)BeginEvent Version 6 CMS_2011A Data Jet !# /Run2011A/Jet/MOD/12Oct2013-v1/20000/000D4260-D23E-E311-A850-02163E008D77.mod !#Cond RunNum EventNum LumiBlock NPV Timestamp msOffset Cond 160578 38142433 366 4 1300254008 84656 !#Trig Name Prescale_1 Prescale_2 Fired? Trig HLT_DiJetAve30U_v4 1 15 0 Trig HLT_DiJetAve50U_v4 1 3 1 Trig HLT_Jet110_v1 1 1 1 ... !# AK5 px py pz energy jec area no_of_const neu_had_frac ... AK5 -48.53 91.23 922.46 928.25 1.15 0.77 3 0.17 ... AK5 27.14 -27.95 -176.24 180.60 1.11 0.71 14 0.11 ... AK5 6.87 -27.39 -127.71 130.89 1.13 0.59 10 0.14 ... ... # PFC px py pz energy pdgId PFC 3.05 -2.27 -18.08 18.48 211 PFC 3.51 -3.48 -21.66 22.22 211 PFC 2.83 -3.01 -20.00 20.42 -211 PFC 2.89 -2.37 -18.40 18.77 211 PFC 1.21 -1.31 -7.58 7.79 -211 PFC 1.62 -2.72 -12.17 12.58 -211 PFC 7.15 -7.56 -46.86 48.01 22 ... !EndEvent
Crucial: JEC factors, jet quality criteria, and particle flow candidates
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 15
Example MOD Event (Simplified)BeginEvent Version 6 CMS_2011A Data Jet !# /Run2011A/Jet/MOD/12Oct2013-v1/20000/000D4260-D23E-E311-A850-02163E008D77.mod !#Cond RunNum EventNum LumiBlock NPV Timestamp msOffset Cond 160578 38142433 366 4 1300254008 84656 !#Trig Name Prescale_1 Prescale_2 Fired? Trig HLT_DiJetAve30U_v4 1 15 0 Trig HLT_DiJetAve50U_v4 1 3 1 Trig HLT_Jet110_v1 1 1 1 ... !# AK5 px py pz energy jec area no_of_const neu_had_frac ... AK5 -48.53 91.23 922.46 928.25 1.15 0.77 3 0.17 ... AK5 27.14 -27.95 -176.24 180.60 1.11 0.71 14 0.11 ... AK5 6.87 -27.39 -127.71 130.89 1.13 0.59 10 0.14 ... ... # PFC px py pz energy pdgId PFC 3.05 -2.27 -18.08 18.48 211 PFC 3.51 -3.48 -21.66 22.22 211 PFC 2.83 -3.01 -20.00 20.42 -211 PFC 2.89 -2.37 -18.40 18.77 211 PFC 1.21 -1.31 -7.58 7.79 -211 PFC 1.62 -2.72 -12.17 12.58 -211 PFC 7.15 -7.56 -46.86 48.01 22 ... !EndEvent
Crucial: JEC factors, jet quality criteria, and particle flow candidates
Trigger Turn-on Behavior Hardest Jet pT
0 100 200 300 400 500 600 700 800
Trigger Jet pT [GeV]
0.7
0.8
0.9
1.0
1.1
1.2
1.3
Cro
ssS
ect
ion
Rati
o
480390
310270
210150
11090
AK5; |η| < 2.4
Preliminary (16%)
Jet60 / Jet30
Jet80 / Jet60
Jet110 / Jet80
Jet150 / Jet110
Jet190 / Jet150
Jet240 / Jet190
Jet300 / Jet240
Jet370 / Jet300
CMS 2011 Open DataCMS 2011 Open Data
New for 2011:Detector Simulated Data (!)
10−810−710−610−510−410−310−210−1100101102103104105106
Dif
f.C
ross
Sect
ion
[pb/G
eV
]
Preliminary (16%)
CMS 2011 Open Data
Pythia 6 Tune Z2 (Simulated)
Pythia 6 Tune Z2 (Truth)
CMS 2011 Open Data
AK5; |η| < 2.4
pjetT > 90 GeV
0 200 400 600 800 1000 1200 1400 1600 1800
pT [GeV]
0.5
1.0
1.5
2.0
Rati
oto
Sim
ula
ted
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 16
Basic Jet PropertiesPseudorapidity
Jet Mass (without JMC)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
Dif
f.C
ross
Sect
ion
[nb]
Preliminary (16%)
CMS 2011 Open Data
Pythia 6 Tune Z2 (Simulated)
Pythia 6 Tune Z2 (Truth)
CMS 2011 Open Data
AK5
pjetT > 270 GeV
−4 −3 −2 −1 0 1 2 3 4
η
0.5
1.0
1.5
2.0
Rati
oto
Sim
ula
ted
0
20
40
60
80
100
120
140
160
Dif
f.C
ross
Sect
ion
[pb/G
eV
]
Preliminary (16%)
CMS 2011 Open Data
Pythia 6 Tune Z2 (Simulated)
Pythia 6 Tune Z2 (Truth)
CMS 2011 Open Data
AK5; |η| < 2.4
pjetT > 270 GeV
0 10 20 30 40 50 60 70 80 90
Mass [GeV]
0.5
1.0
1.5
2.0
Rati
oto
Sim
ula
ted
Track Multiplicity
Expected mismatch with truth from strange hadrons (cτ0 ∈ [10,1000] mm)
Mismatch with simulated under study
0
50
100
150
200
Dif
f.C
ross
Sect
ion
[pb]
Preliminary (16%)
CMS 2011 Open Data
Pythia 6 Tune Z2 (Simulated)
Pythia 6 Tune Z2 (Truth)
CMS 2011 Open Data
AK5; |η| < 2.4
pjetT > 270 GeV
0 10 20 30 40 50 60
Track Constituent Multiplicity
0.5
1.0
1.5
2.0
Rati
oto
Sim
ula
ted
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 17
Opportunity to streamline,since steps common to almost every collider data project
Preliminary 2011 Processing:!
2 students, ~6 months!(First collider physics project,previous experience in Python/C++,using template from 2010 analysis)
MOD
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 18
The Future of Public Collider Data
Jet Substructure and QCD Splittings
Using the CMS Open Data
0
1
2
3
4
5
6
7
8
1
σdσdzg
CMS 2010 Open Data
Theory (MLL; all)
Pythia 8.219
Herwig 7.0.3
Sherpa 2.2.1
pPFCT > 1.0 GeV
AK5; |η| < 2.4
pjetT > 150 GeV
SD: β = 0, zcut = 0.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Track zg
0.5
1.0
1.5
2.0
Rati
oto
Pyth
ia
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 19
CMS 2010 Run B7 TeV pp31.8 pb–1
!Jet Primary Dataset768,687 events(from 20 million)!HLT_Jet70UHLT_Jet100UHLT_Jet140U
MOD
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 20
Anti-kt, R = 0.5|η| < 2.4pT > 20 GeV“Loose” JQC
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 21
Anti-kt, R = 0.5|η| < 2.4Hardest pT > 150 GeV“Loose” JQC PFC pT > 1 GeV
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 22
Jet Grooming: mMDT/Soft Drop β = 0
![Larkoski, Marzani, Soyez, JDT, 1402.2657; Dasgupta, Fregoso, Marzani, Salam, 1307.0007;
see also Butterworth, Davison, Rubin, Salam, 0802.2470]
zg > zcut θgβ
zg
1–zg
θg
Drop radiation until:
zg > zcut θgβ
zg
θg
Drop radiation until:
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 23
0.0 0.1 0.2 0.3 0.4 0.5
zg
0.0
0.2
0.4
0.6
0.8
1.0
θg
pPFCT > 1.0 GeV
AK5; |η| < 2.4
pjetT > 150 GeV
mMDT / SDβ=0 : zcut = 0.1
CMS 2010 Open DataCMS 2010 Open Data
0
4
8
12
16
20
Collinear
dPi→ig '
2αs
πCi
dθ
θ
dz
z
Soft
Cq = 4/3Cg = 3
Soft/Collinear Behavior
z ≈ zg
zg > zcut θgβ
Drop radiation until:
[Larkoski, Marzani, JDT, 1502.01719; see also Larkoski, JDT, 1307.1699]
(!)z
1–zg
θ
zg > zcut θgβ
zg
θg
Drop radiation until:
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 24
0.0 0.1 0.2 0.3 0.4 0.5
zg
0.0
0.2
0.4
0.6
0.8
1.0
θg
pPFCT > 1.0 GeV
AK5; |η| < 2.4
pjetT > 150 GeV
mMDT / SDβ=0 : zcut = 0.1
CMS 2010 Open DataCMS 2010 Open Data
0
4
8
12
16
20
Collinear
dPi→ig '
2αs
πCi
dθ
θ
dz
z
Soft
Cq = 4/3Cg = 3
Soft/Collinear Behavior
z ≈ zg
zg > zcut θgβ
Drop radiation until:
[Larkoski, Marzani, JDT, 1502.01719; see also Larkoski, JDT, 1307.1699]
(!)z
1–zg
θ
Perfect application of CMS Open Data
Benefits from low trigger thresholds and low pileup
2010 data ⇒ 2014 release ⇒ 2015 idea ⇒ 2017 analysis
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 25
0
1
2
3
4
5
6
7
8
9
1
σdσdzg
CMS 2010 Open Data
Theory (MLL)
Pythia 8.219
Herwig 7.0.3
Sherpa 2.2.1
pPFCT > 1.0 GeV
AK5; |η| < 2.4
pjetT > 150 GeV
mMDT / SDβ=0 : zcut = 0.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6
zg
0.5
1.0
1.5
2.0
Rati
oto
Pyth
ia
Initial 2011 Results2010 Analysis
[Larkoski, Marzani, JDT, Tripathee, Xue, 1704.05066]
zg
Exposing the QCD Splitting Function
0
1
2
3
4
5
6
7
8
9
1
σdσdzg
Preliminary (16%)
CMS 2011 Open Data
Pythia 6 Tune Z2 (Simulated)
Pythia 6 Tune Z2 (Truth)
CMS 2011 Open Data
AK5; |η| < 2.4
pjetT > 270 GeV
mMDT / SDβ=0 : zcut = 0.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6
zg
0.5
1.0
1.5
2.0
Rati
oto
Sim
ula
ted
Clear impact from detector(N.B.: no PFC pT cut imposed)
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 26
The Future of Public Collider Data
Jet Substructure and QCD Splittings
Using the CMS Open Data
0
1
2
3
4
5
6
7
8
1
σdσdzg
CMS 2010 Open Data
Theory (MLL; all)
Pythia 8.219
Herwig 7.0.3
Sherpa 2.2.1
pPFCT > 1.0 GeV
AK5; |η| < 2.4
pjetT > 150 GeV
SD: β = 0, zcut = 0.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Track zg
0.5
1.0
1.5
2.0
Rati
oto
Pyth
ia
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 27
V. ADVICE TO THE COMMUNITY
A. Challenges
B. Recommendations
VI. CONCLUSION
As the LHC explores the frontiers of scientific knowl-
edge, its primary legacy will be the measurements and
discoveries made by the LHC detector collaborations. But
there is another potential legacy from the LHC that could be
just as important: granting future generations of physicists
access to unique high-quality data sets from proton-proton
collisions at 7, 8, 13, and 14 TeV.
In our view, the best way to build a legacy data set is to
Jet substructure studies with CMS open data
Aashish Tripathee,1,*
Wei Xue,1,†
Andrew Larkoski,2,‡
Simone Marzani,3,§
and Jesse Thaler1,∥
1
PHYSICAL REVIEW D 96, 074003 (2017)
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 28
E.g. ALEPH Confronts the CMS Ridge
[Badea, Baty, Chang, Innocenti, Yen-Jie Lee, Maggi, McGinn, Peters, Sheng, JDT, appeared at Quark Matter 2018]
pp
pp pPb Pb PbPb
vs.e+e–
1990–95 e+e– data
2010 pp surprise!
⇒
2018 e+e– analysis
Competes with needs
of the collaboration
Less impact on
the target audience
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 29
Different Options for “Public Data”
Audience
Timing
Event Displays
Instant Quarter Year Decade
Outreach
Archival
Education
Few Years
Research
Balance
openness and
priority
Competes with needs
of the collaboration
Less impact on
the target audience
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 30
Different Options for “Public Data”
Audience
Timing
Event Displays
Instant Quarter Year Decade
Outreach
Archival
Education
Few Years
Research
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 31
Data preservation (and outside analyses)require significant resources:
People, time, ideas, and money
To help justify these resources,let me address three common concerns
about public collider data raised by our work
0.0
0.1
0.2
0.3
0.4
0.5
0.6
θgσ
dσ
dθg
CMS 2010 Open Data
Theory (MLL)
Pythia 8.219
Herwig 7.0.3
Sherpa 2.2.1
pPFCT > 1.0 GeV
AK5; |η| < 2.4
pjetT > 150 GeV
mMDT / SDβ=0 : zcut = 0.1
0.01 0.02 0.05 0.10 0.20 0.50 1.00
θg
0.5
1.0
1.5
2.0
Rati
oto
Pyth
ia
0.0
0.1
0.2
0.3
0.4
0.5
0.6
θgσ
dσ
dθg
Preliminary (16%)
CMS 2011 Open Data
Pythia 6 Tune Z2 (Simulated)
Pythia 6 Tune Z2 (Truth)
CMS 2011 Open Data
AK5; |η| < 2.4
pjetT > 270 GeV
mMDT / SDβ=0 : zcut = 0.1
0.01 0.02 0.05 0.10 0.20 0.50 1.00
θg
0.5
1.0
1.5
2.0
Rati
oto
Sim
ula
ted
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 32
!
“There is no way you can do an external analysis with thesame degree of sophistication as within the collaboration”
Balance between Sophistication and Exploration
Agreed (mostly)!
But with unexpected theoretical/experimental issues at play, value in exploratory studies
[Tripathee, Xue, Larkoski, Marzani, JDT, 1704.05842]
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 33
!
“This work competes with ongoing collaboration analyseswithout the scrutiny of internal review”
Synergy between Internal and External Efforts
[CMS, 1708.09429 ⇒ Phys. Rev. Lett. 120:142302]
Agreed, but fine line between “compete” and “complement”Important to have robust peer review (e.g. dual referees)
0.1 0.2 0.3 0.4 0.5
gd
zdN
je
tN1
0
2
4
6
8
10 Centrality: 50-80%
PbPb
pp smeared
CMS
1.8 0.1 0.2 0.3 0.4 0.50
2
4
6
8
10 Centrality: 30-50%
1.8 0.1 0.2 0.3 0.4 0.50
2
4
6
8
10 Centrality: 10-30%
1.8 0.1 0.2 0.3 0.4 0.50
2
4
6
8
10 Centrality: 0-10%
1.8
-1bµ, PbPb 404 -1 = 5.02 TeV, pp 27.4 pbNN
s
gz0.1 0.2 0.3 0.4 0.5
0
gz0.1 0.2 0.3 0.4 0.5
0
gz0.1 0.2 0.3 0.4 0.5
0
gz0.1 0.2 0.3 0.4 0.5
0
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 34
!
“If you really wanted to do this jet substructure measurement,you should have joined CMS as an associate member”
Value of Open-Ended Investigations
Agreed, but what I really wanted to do is figure out theanswer to this question (curiosity-driven research)
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 35
The CMS Open Data is a fantastic resource,with many exciting applications
My View
Stress-testing archival data strategies
Enabling exploratory/proof-of-principle studies
Facilitating dialogue between theory and experiment
Educating future scientists
Researching physics in and beyond the standard model
These are only possible with sustainedinvestment in public data initiatives
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 36
The Future of Public Collider Data
Jet Substructure and QCD Splittings
Using the CMS Open Data
0
1
2
3
4
5
6
7
8
1
σdσdzg
CMS 2010 Open Data
Theory (MLL; all)
Pythia 8.219
Herwig 7.0.3
Sherpa 2.2.1
pPFCT > 1.0 GeV
AK5; |η| < 2.4
pjetT > 150 GeV
SD: β = 0, zcut = 0.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Track zg
0.5
1.0
1.5
2.0
Rati
oto
Pyth
iaSummary
Unique collider data set, ideal for exploratory studies
Exposing the universal singularity structure of gauge theories
Sustained investment from outreach to research to archives
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 37
Backup Slides
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 38
Additional 2011 PlotsAzimuth
Track Mass
Constituent Multiplicity
0
20
40
60
80
100
120
140
Dif
f.C
ross
Sect
ion
[pb]
Preliminary (16%)
CMS 2011 Open Data
Pythia 6 Tune Z2 (Simulated)
Pythia 6 Tune Z2 (Truth)
CMS 2011 Open Data
AK5; |η| < 2.4
pjetT > 270 GeV
0 10 20 30 40 50 60 70 80 90
Constituent Multiplicity
0.5
1.0
1.5
2.0
Rati
oto
Sim
ula
ted
0
50
100
150
200
Dif
f.C
ross
Sect
ion
[pb/G
eV
]
Preliminary (16%)
CMS 2011 Open Data
Pythia 6 Tune Z2 (Simulated)
Pythia 6 Tune Z2 (Truth)
CMS 2011 Open Data
AK5; |η| < 2.4
pjetT > 270 GeV
0 10 20 30 40 50 60
Track Mass [GeV]
0.5
1.0
1.5
2.0
Rati
oto
Sim
ula
ted
0.0
0.2
0.4
0.6
0.8
1.0
Dif
f.C
ross
Sect
ion
[nb]
Preliminary (16%)
CMS 2011 Open Data
Pythia 6 Tune Z2 (Simulated)
Pythia 6 Tune Z2 (Truth)
CMS 2011 Open Data
AK5; |η| < 2.4
pjetT > 270 GeV
0 π/2 π 3π/2 2π
φ
0.5
1.0
1.5
2.0
Rati
oto
Sim
ula
ted
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 39
Sewing Together Simulated Data Samples
0 100 200 300 400 500 600 700
Hardest Jet pT [GeV]
10−3
10−2
10−1
100
101
102
103
104
105
106
107
108
109
Cro
ssS
ect
ion
[pb]
Pythia 6 Tune Z2 (Simulated)AK5; |η| < 2.4
Preliminary (16%)
pGenT ∈ [15,30] GeV
pGenT ∈ [30,50] GeV
pGenT ∈ [50,80] GeV
pGenT ∈ [80,120] GeV
pGenT ∈ [120,170] GeV
pGenT ∈ [170,300] GeV
pGenT ∈ [300,470] GeV
CMS 2011 Open DataCMS 2011 Open Data
0 500 1000 1500 2000 2500 3000
Hardest Jet pT [GeV]
10−10
10−9
10−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
101
102
Cro
ssS
ect
ion
[pb]
Pythia 6 Tune Z2 (Simulated)AK5; |η| < 2.4
Preliminary (16%)
pGenT ∈ [470,600] GeV
pGenT ∈ [600,800] GeV
pGenT ∈ [800,1000] GeV
pGenT ∈ [1000,1400] GeV
pGenT ∈ [1400,1800] GeV
pGenT ≥ 1800 GeV
CMS 2011 Open DataCMS 2011 Open Data
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 40
Textbook QCD: Universal Collinear Limit
Collinearsingularity
Softsingularity
dPi→ig '
2αs
πCi
dθ
θ
dz
z
x
1→2
2z
1–zθ
Splitting Function
Cq = 4/3Cg = 3
≈
2. . .
. . .
2. . .
. . .
2→n 2→n–1
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 41
A Quad-Jet Puzzle in Archival ALEPH Data
[Kile, von Wimmersperg-Toeller, 1706.02242, 1706.02255, 1706.02269]
2: ∆ for 45 GeV < Σ < 61 GeV.
-200
-100
0
100
200
300
400
500
600
10 20 30 40 50 60 70 80 90 100
Σ (GeV/c2)
events
/ 1
.00 G
eV
/c2
LEP2 ALEPH archived dataSherpa MEPS@LO
4fWW
4fZZother
Nfit = 223.28 ± 61.41Mean = 53.97 ± 1.24
Width = 3.55 ± 1.14
0.6
0.8
1
1.2
1.4
10 20 30 40 50 60 70 80 90 100
(a)
-100
-50
0
50
100
150
200
250
300
-100 -80 -60 -40 -20 0 20 40 60 80 100
∆ (GeV/c2)
eve
nts
/ 5
Ge
V/c
2
LEP2 ALEPH archived data
SHERPA MEPS@LO
4fWW
4fZZ
Other
0.6
0.8
1
1.2
1.4
-100 -80 -60 -40 -20 0 20 40 60 80 100
(?!)
Sym-Dijet Mass Average Sym-Dijet Mass Difference
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 42
Example of stress-testing archival data strategies
Jesse Thaler — The Future is Open: Jet Substructure with CMS Public Data 43
New: Public neutrino data!
Derived data from 2008-2012 ⇒ Released May 2018