deep learning in high - copernicus...4 cern international organisation close to geneva, straddling...
TRANSCRIPT
2
Deep Learning in High Energy Physics
Examples from the LHC
Sofia Vallecorsa – November 6th, 2019
3
Outline
Introduction
Generative ModelsApplications in simulation, real time selection and pattern recognition
Graph Neural NetworkPattern recognition (tracking)
Conclusions
4
CERN
International organisation close to Geneva, straddling Swiss-French border, founded 1954
Facilities for fundamental research in particle physics
23 member states,1.1 B CHF budget
3’197 staff, fellows, apprentices, …
13’128 associates
3
“Science for peace”
1954: 12 Member States
Members: Austria, Belgium, Bulgaria, Czech republic, Denmark,
Finland, France, Germany, Greece, Hungary, Israel, Italy,
Netherlands, Norway, Poland, Portugal, Slovak Republic, Spain,
Serbia, Sweden, Switzerland, United KingdomCandidate for membership: Cyprus, SloveniaAssociate members: India, Lithuania, Pakistan, Turkey, Ukraine
Observers: EC, Japan, JINR, Russia, UNESCO, United States of
America
Numerous non-member states with collaboration agreements
2’531 staff members, 645 fellows,
21 apprentices
7’000 member states, 1’800 USA,
900 Russia, 270 Japan, …
5
The Large Hadron Collider (LHC)
5
https://visit.cern/
66
The Higgs Boson
7
The Higgs Boson completes the Standard Model,
but the Model explains only about 5% of our Universe
What is the other 95% of the Universe made of?
How does gravity really works?
Why there is no antimatter in nature?7
8
Some background
PERCEPTRON
Neural Networks
Decision Trees
Random Forests
BDT
SVM
Modern
Deep
Learning
Deep
Learning at
the LHC
~Now
NN @LEP BDT
@SLACImage from “Deep Learning”, I. GoodFellow, MIT press book
9
Why? …Big Data
Accelerators infrastructure
9600 magnets for Beam Control
1232 superconducting dipoles for bending
Experiments (detectors & physics data)
330 PB of collisions data stored by end 2018
The computing infrastructure
LHC data is multi-structured, hybrid
LHC is entering the Big Data era
Next generation colliders will require larger, highly granular
detectors that will generate huge particle data rates O(100 TB/s)
10
Deep Learning for HEP
DL can recognize patterns in large complicated data sets
Re-cast physics problems as “DL problems”
Adopting ”new” computing models
Robust performance studies
Model interpretability and systematics
Domain knowledge
Physics-based validation
B. Hooberman, S.V. et al. (NIPS 2017)
DNN find hidden patterns in raw data
No need for high level features
11
Deep Learning for HEP (II)
Analysis
Optimisation
Raw data processing
Monitoring and Control Systems
Real-time filtering
Simulation
12
Deep Generative Models
Shallow models learn simple internal representations
→ Deep Generative Models
Allow higher levels of abstractions
Improve generalisation and transfer
→ Multiple applications
Discovery
Anomaly Detection
Planning
Transfer Learning
→Different Models
Generative Adversarial Networks
(Variational) AutoEncoders
13
Detector response as images
Monte Carlo simulation
Monte Carlo simulation of detector response is extremely
demanding in terms of computing resources
→50 % of LHC Computing Grid resources today
Interpret detector output as images
Read-out channels become a pixels in a image
Use computer vision techniques to interpret results
Replace Monte Carlo approach with Generative Models
Pixelized 3D image
14
Physics simulation with GAN
14
GAN generated energy shower
MethodTime to generate one
particle
Classical Monte Carlo 2S Intel® Xeon® Platinum 8180
17000 ms
3D GAN 2S Intel® Xeon® Platinum 8160
0.85 ms (20000x)
Single cell response
1.02.0
3.9
7.8
15.5
31
61
120
100% 100%98% 97% 97% 96% 95% 94%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1
2
4
8
16
32
64
128
256
1 2 4 8 16 32 64 128
Spee
du
p E
ffic
ien
cy
Spee
du
p
Intel(R) 2S Xeon(R) Nodes
High Energy Physics: 3D GANs Training Speedup PerformanceIntel 2S Xeon(R) on Stampede2/TACC, OPA Fabric
TensorFlow 1.9+MKL-DNN+horovod, Intel MPI, Core Aff. BKMs, 4 Workers/Node
2S Xeon 8160: Secs/Epoch Speedup Ideal Scaling Efficiency
128-Node Perf:148 Secs/Epoch
Geant4GAN generated
15
Real-time event selection
We can process only a minimal fraction of collider dataKeep only the interesting events
Sophisticated studies to optimise selction for specific physics processesWe don’t know what unknown physics looks like!
16
Physics Mining as anomaly detection
Classical strategy uses very loose selection 1M Standard Model (“known physics”) events per day
O. Cerri, ACAT2019
Train Variational Auto Encoders on known physics
Monte Carlo data
Real detector data
Run it in real time and store only “anomalies”
17
Selecting the unknown!
Alternative and robust strategy
Create a dataset of anomalous events
Probe large range of processes
Might open new physics directions
VAE as model-independent new physics selection tool
18
Pattern recognition in HEPParticle Trajectory Reconstruction
J-R. Vlimant, CERN-DS seminar
Particle trajectory bended in a solenoid magnetic fieldNeed curvature to measure momentum
Particle ionize silicon sensors arranged in concentric layersThousands of sparse hits
Many hits are uninteresting
19
Image-based approach:
CNN used for activity segmentation and detector signal classification
AutoEncoders for trajectory reconstruction
20
Graph Neural Networks
Next generation colliders will present challenges to image-based
methods
Space-point representation enables use of GNN
Structure data as a (directed) graph of connected hits
Connect plausibly-related hits using geometric constraints
Full event embedding requires large graphs ( ~105 nodes)
Sparse matrix implementation
Identify disjoint sub-graphs and Distributed learning of large graphs
HepTrkX GNN is a cascade of Input, Edge and Node Networks
HepTrkX GNN
21
Beyond High Energy Physics
CERN openlab is a science – industry
partnership to drive R&D and
innovation
Collaboration with UNOSAT on Deep Learning for satellite imagery
22
Counting shelters in refugee camps
Manually scan million pixels
satellite photos for disaster relief:
Evolution of refugee camps
Natural disasters
Buildings damage
High precision is required (> 95%)
UN decisions depend on this data
23
CNN for counting refugee shelters
Transfer learning from Region-based
Convolutional NN model
Average precision is 82%
Speedup is 200x
https://indico.cern.ch/event/727274/contributions/3100369/
Retrain
24
Generating Synthetic Satellite DataData availability is a problem: Interesting events are “rare”
Use Generative Adversarial Networks to increase dataset size
25
Scaling up to larger sizes
Work in progress
26
Summary & Conclusions
Deep Learning applications in all fields of High Energy Physics
Development is accelerated by a diversified community (industry and society, applied and fundamental science)
Results are very promising
We are moving from the prototyping stage to getting ready for production
Interpretability of the models, systematic study of their performance
Integration in our software frameworks
Detailed studies on computing resources
Training and inference will likely become important workflows for large experiments
Modify our computing model and infrastructure (accelerators, HPC resources, ...)
28
Bonus Tracks
Quantum Machine Learning
Distributed Training
29
A quantum advantage for ML?
Quantum linear algebra is generally faster than classical counterpart
Some standard ML techniques estimate the ground state of Hamiltonians
ML algorithms have some tolerance to errors
Specific quantum techniques can be exploited to bring further improvement
29Biamonte et al. arxiv: 1611.09347
30
Quantum Support Vector Machine
Quantum SVM for ttH (H → 𝜸𝜸) classification
QSVM is simulated on IBM Qiskit simulator
Entanglement is used to encode relationships between features
Apply PCA to input data features
Reduced from 45 to 8,10 or 20 (limited by number of qubits)
Running full training with quantum simulators requires large
computing resources
Memory increases with qubit, training events and complexity
Quantum Machine Learning are among the first applications to be implemented on
near-term devices
30
31
Quantum GAN
Generative Adversarial Networks are among the most interesting models in classical machine learning
Quantum GAN would have more representational power than classical GAN
Different hybrid classical-quantum algorithms for generative models exist
i.e quantum Variational Auto-Encoders on D-Wave annealer
Train a quantum GAN to generate few-pixels images
Currently investigating two possible approaches:A hybrid schema with a quantum generator learning the target PDF using either a classical network or a variational quantum circuit as a discriminator (Variational Quantum Generator)
Full quantum adversarial implementation (quGAN)
31
quGAN
32
Distributed training
• Frameworks• Horovod
• mpi_learn
• Hardware resources• Cloud
(HNSciCloud)
• HPC centers(Oakridge –TACC)
T-SystemsP100 CSCS
1.02.0
3.9
7.8
15.5
31
61
120
100% 100%98% 97% 97% 96% 95% 94%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1
2
4
8
16
32
64
128
256
1 2 4 8 16 32 64 128
Spee
dup
Effi
cien
cy
Spee
dup
Intel(R) 2S Xeon(R) Nodes
High Energy Physics: 3D GANs Training Speedup PerformanceIntel 2S Xeon(R) on Stampede2/TACC, OPA Fabric
TensorFlow 1.9+MKL-DNN+horovod, Intel MPI, Core Aff. BKMs, 4 Workers/Node
2S Xeon 8160: Secs/Epoch Speedup Ideal Scaling Efficiency
128-Node Perf:148 Secs/Epoch
• Cloud deployment via docker +
Kubernetes/Kubeflow (R. Rocha CERN IT-CM)
33
Deployment @LRZ
Re-packaged 3DGAN using Charliecloud
Submit to SuperMUC-NG via slurm
Scale up to 512 nodes
Distributed Tensorflow using Charliecloud containers
NodesTime(S)
per EpochLinear
4 3806 3806
8 1910 1903
16 1001 952
32 504 476
64 253 238
128 124 119
256 61 60
512 33 30
34
Some performance degradation
Mostly at low energy
Network optimised for the 100-
200 GeV central region
Applied warmup and scaling of
initial learning rate
Further investigation ongoing
Physics performance
Monte Carlo
BatchSize=1024
BatchSize=4096
BatchSize=10240
Fraction of
particle energy
deposited in
the calorimeter
35
Additional Information
36
Why? …New Challenges
CMS simulation
37
Generator generates data from random noise
Generator learning is supervised by the discriminatornetwork
37
Two networks competing with each other
Generative Adversarial Networks
arXiv:1406.2661
Image source
The forger/detective case
Forger shows its Monalisa to the detective
Detective says it is fake
Forger makes new Monalisa based on feedback
Iterate until detective is fooled
Arxiv:1701.00160
38
Generative adversarial training
Assume a deterministic generator:
A prior over latent space:
Define a discriminator:
A learnable loss function from the min-max game
Equilibrium when Jensen-Shannon divergence between real and generated samples is minimized
39
Generative adversarial trainingGenerator is trained to maximize the probability of Discriminator making a mistake
39
arXiv:1406.2661v1
G and D don’t improve anymore.D is unable to differentiate
D is not an accurate classifier
D is trained to discriminate samples from data
D gradient guides G to regions more likely to be classified as data
40
Wasserstein GAN
Arjowski et al ’17, arxiv:1701.07875
Standard GAN loss formulation can lead to
• vanishing gradients when discriminator too powerful
• mode collapse (generating only a subset of the target distribution
Alternative Wasserstein metric ( from Earth Mover’s distance)
• solution may not be optimal due to biased gradients
γ «optimal transport plan»
Luke Metz, Ben Poole, David Pfau, and Jascha Sohl-Dickstein. 2016
Reformulate the loss using Cramer distance → remove bias on gradients → Cramer GAN
41
Cramer GAN
42
Other GAN flavors
Original GAN was based on MLP in 2014
Deep Convolutional GAN in 2015
Conditional GAN
Extended to learn a parameterized generator pmodel(x|θ);
Useful to obtain a single generator object for all θ configurations
Interpolate between distribution
Auxiliary Classifier GAN
D can assign a class to the image
Progressive growing GAN
Stack GAN
BiGAN ..
42arXiv:1610.0958arXiv: 1411.1784
43
Fast simulation in High Energy Physics
Monte Carlo simulation is a major workload in
terms of computing resources.
Generative Models are a generic approach to
replace expensive calculations
Inference is faster than Monte Carlo approach
Industry building highly optimized software,
hardware, and cloud services.
Numerous R&D activities (LHC and beyond)
43
MC - related
WLCG Wall Clock time for the ATLAS experiment
43
Time to create an electron shower
Method MachineTime/Shower
(msec)
Full Simulation (geant4)
Intel Xeon Platinum 8180
17000
3DGAN(batch size 128)
Intel Xeon Platinum 8160 (TF 1.12)
1
S.V., ACAT 2019
44
GANs for ATLAS LAr calorimeter
Train the generator against two critics
Wasserstein GAN model reproduces the mean energy distribution but not its widthAt convergence critic can’t see the difference in real and fake images anymore.
ATL-SOFT-PUB-2018-001
45
LHCb RICH fast simulationMaevskyi, ACAT2019
PID information encoded in log-likelihood differences (DLL) between particle type hypotheses
A fully connected layers GAN simulates DLLsInput track parameters and total number of tracks
Cramer distance to avoid gradients bias
Train on real data using sPlot* technique to extract signal distributions
*Pivk, Muriel et al. Nucl.Instrum.Meth.A 555 (2005) 356-369
AUCs differences
between few sigmas
46
CMS HGCAL prototype
Wasserstein conditional GAN (convolutions)
Train a generator against a critics and 2 constrainer networks reconstructing energy and impact point coordinates
Good agreement to Geant4Some problems at low energy
High Granularity and Hexagonal cells prototype for CMS upgrade
arxiv.org: 1807.01954
47
RICH DLLs
48
Condition training on input particle energy and incident angle, Custom losses
Auxiliary regression tasks assigned to the discriminator
3D convolutional GAN
Generator:
Discriminator:
49
3DGAN Generatedevents
Geant4 GAN
Geant4 GAN
Geant4 GAN
Ep=147 GeV, α= 88°Ep=189 GeV, α= 63°
Ep=111 GeV, α=115°
GAN generated electron shower
50
Structural Similarity Index (SSIM) [4] is used to assess similarity between
images
Tipically used in denoising applications
Measure diversity in GAN generated images
Sample diversityStructural Similarity Index
SSIM as training progresses
L G4 vs G4 GAN vs GAN
1 0.94 0.95
1e-2 0.21 0.251e-4 0.045 0.061
1e-6 0.045 0.051
E=150 GeV, orthogonal incident angle
51
AutoEncoders &
Variational AutoEncoders
AEs learn how to describe training
dataset in latent space
Data compression, dimensionality reduction
(PCA) and de-noising
Variational AEs have added
constraints on the encoded
representations
Learn latent model then sample from it
Many applications at the LHC
52
Other applications in fast simulation
Generative models for ALICE TPC simulation (ACAT2019)
Conditional Wasserstein GANs for fast simulation of electromagnetic showers in a CMS HGCAL prototype (IML WG 04/18)
Variational AutoEncoders to simulate ATLAS LAr calorimeter (PASC18)
Wasserstein GANs to generate high-level physics variables based on Monte Carlo ttH (superfast-simulation) (IML WG 04/18)
Particle-GAN for Full Event Simulation at the LHC (ACAT2019)
Refining Detector Simulation using Adversarial Networks (IML WG 04/
18)
Model-Assisted GANs for the optimisation of simulation parameters (IML WG 04/19)
53
HEP.TrkX : https://heptrkx.github.io/
Image-based approach
Image “segmentation” with LSTM+CNN
Image “captioning” with CNN+LSTM
CNN vertex finder to constrain seeding
Space-point representation
RNN hit predictor model
Graph Neural Networks
Quantum annealing!
Quadratic Unconstrained Binary Optimisation (QUBO) can
be mapped to an Ising Hamiltonian with change of variable
{0,1} →{-1,1}
A major challenge after LHC upgrade
More examples
TrackMLchallenge on kaggle
J. R. Vlimant, ACAT2019
54
CERN openlab
54
Evaluate state-of-the-art technologies
in a challenging environment and
improve them
Test in a research environment today
what will be used in many business
sectors tomorrow
Training
Dissemination and outreach
A science – industry partnership to drive R&D and innovation
openlab.cern