deep learning in high - copernicus...4 cern international organisation close to geneva, straddling...

2

Deep Learning in High Energy Physics

Examples from the LHC

Sofia Vallecorsa – November 6th, 2019

3

Outline

Introduction

Generative ModelsApplications in simulation, real time selection and pattern recognition

Graph Neural NetworkPattern recognition (tracking)

Conclusions

4

CERN

International organisation close to Geneva, straddling Swiss-French border, founded 1954

Facilities for fundamental research in particle physics

23 member states,1.1 B CHF budget

3’197 staff, fellows, apprentices, …

13’128 associates

3

“Science for peace”

1954: 12 Member States

Members: Austria, Belgium, Bulgaria, Czech republic, Denmark,

Finland, France, Germany, Greece, Hungary, Israel, Italy,

Netherlands, Norway, Poland, Portugal, Slovak Republic, Spain,

Serbia, Sweden, Switzerland, United KingdomCandidate for membership: Cyprus, SloveniaAssociate members: India, Lithuania, Pakistan, Turkey, Ukraine

Observers: EC, Japan, JINR, Russia, UNESCO, United States of

America

Numerous non-member states with collaboration agreements

2’531 staff members, 645 fellows,

21 apprentices

7’000 member states, 1’800 USA,

900 Russia, 270 Japan, …

5

The Large Hadron Collider (LHC)

5

https://visit.cern/

66

The Higgs Boson

7

The Higgs Boson completes the Standard Model,

but the Model explains only about 5% of our Universe

What is the other 95% of the Universe made of?

How does gravity really works?

Why there is no antimatter in nature?7

8

Some background

PERCEPTRON

Neural Networks

Decision Trees

Random Forests

BDT

SVM

Modern

Deep

Learning

Deep

Learning at

the LHC

~Now

NN @LEP BDT

@SLACImage from “Deep Learning”, I. GoodFellow, MIT press book

9

Why? …Big Data

Accelerators infrastructure

9600 magnets for Beam Control

1232 superconducting dipoles for bending

Experiments (detectors & physics data)

330 PB of collisions data stored by end 2018

The computing infrastructure

LHC data is multi-structured, hybrid

LHC is entering the Big Data era

Next generation colliders will require larger, highly granular

detectors that will generate huge particle data rates O(100 TB/s)

10

Deep Learning for HEP

DL can recognize patterns in large complicated data sets

Re-cast physics problems as “DL problems”

Adopting ”new” computing models

Robust performance studies

Model interpretability and systematics

Domain knowledge

Physics-based validation

B. Hooberman, S.V. et al. (NIPS 2017)

DNN find hidden patterns in raw data

No need for high level features

11

Deep Learning for HEP (II)

Analysis

Optimisation

Raw data processing

Monitoring and Control Systems

Real-time filtering

Simulation

12

Deep Generative Models

Shallow models learn simple internal representations

→ Deep Generative Models

Allow higher levels of abstractions

Improve generalisation and transfer

→ Multiple applications

Discovery

Anomaly Detection

Planning

Transfer Learning

→Different Models

Generative Adversarial Networks

(Variational) AutoEncoders

13

Detector response as images

Monte Carlo simulation

Monte Carlo simulation of detector response is extremely

demanding in terms of computing resources

→50 % of LHC Computing Grid resources today

Interpret detector output as images

Read-out channels become a pixels in a image

Use computer vision techniques to interpret results

Replace Monte Carlo approach with Generative Models

Pixelized 3D image

14

Physics simulation with GAN

14

GAN generated energy shower

MethodTime to generate one

particle

Classical Monte Carlo 2S Intel® Xeon® Platinum 8180

17000 ms

3D GAN 2S Intel® Xeon® Platinum 8160

0.85 ms (20000x)

Single cell response

1.02.0

3.9

7.8

15.5

31

61

120

100% 100%98% 97% 97% 96% 95% 94%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1

2

4

8

16

32

64

128

256

1 2 4 8 16 32 64 128

Spee

du

p E

ffic

ien

cy

Spee

du

p

Intel(R) 2S Xeon(R) Nodes

High Energy Physics: 3D GANs Training Speedup PerformanceIntel 2S Xeon(R) on Stampede2/TACC, OPA Fabric

TensorFlow 1.9+MKL-DNN+horovod, Intel MPI, Core Aff. BKMs, 4 Workers/Node

2S Xeon 8160: Secs/Epoch Speedup Ideal Scaling Efficiency

128-Node Perf:148 Secs/Epoch

Geant4GAN generated

15

Real-time event selection

We can process only a minimal fraction of collider dataKeep only the interesting events

Sophisticated studies to optimise selction for specific physics processesWe don’t know what unknown physics looks like!

16

Physics Mining as anomaly detection

Classical strategy uses very loose selection 1M Standard Model (“known physics”) events per day

O. Cerri, ACAT2019

Train Variational Auto Encoders on known physics

Monte Carlo data

Real detector data

Run it in real time and store only “anomalies”

17

Selecting the unknown!

Alternative and robust strategy

Create a dataset of anomalous events

Probe large range of processes

Might open new physics directions

VAE as model-independent new physics selection tool

18

Pattern recognition in HEPParticle Trajectory Reconstruction

J-R. Vlimant, CERN-DS seminar

Particle trajectory bended in a solenoid magnetic fieldNeed curvature to measure momentum

Particle ionize silicon sensors arranged in concentric layersThousands of sparse hits

Many hits are uninteresting

https://indico.cern.ch/event/795045/attachments/1824081/2991454/vlimant_CERN_Tracking_April19.pdf

19

Image-based approach:

CNN used for activity segmentation and detector signal classification

AutoEncoders for trajectory reconstruction

20

Graph Neural Networks

Next generation colliders will present challenges to image-based

methods

Space-point representation enables use of GNN

Structure data as a (directed) graph of connected hits

Connect plausibly-related hits using geometric constraints

Full event embedding requires large graphs ( ~105 nodes)

Sparse matrix implementation

Identify disjoint sub-graphs and Distributed learning of large graphs

HepTrkX GNN is a cascade of Input, Edge and Node Networks

HepTrkX GNN

21

Beyond High Energy Physics

CERN openlab is a science – industry

partnership to drive R&D and

innovation

Collaboration with UNOSAT on Deep Learning for satellite imagery

22

Counting shelters in refugee camps

Manually scan million pixels

satellite photos for disaster relief:

Evolution of refugee camps

Natural disasters

Buildings damage

High precision is required (> 95%)

UN decisions depend on this data

23

CNN for counting refugee shelters

Transfer learning from Region-based

Convolutional NN model

Average precision is 82%

Speedup is 200x

https://indico.cern.ch/event/727274/contributions/3100369/

Retrain

24

Generating Synthetic Satellite DataData availability is a problem: Interesting events are “rare”

Use Generative Adversarial Networks to increase dataset size

25

Scaling up to larger sizes

Work in progress

26

Summary & Conclusions

Deep Learning applications in all fields of High Energy Physics

Development is accelerated by a diversified community (industry and society, applied and fundamental science)

Results are very promising

We are moving from the prototyping stage to getting ready for production

Interpretability of the models, systematic study of their performance

Integration in our software frameworks

Detailed studies on computing resources

Training and inference will likely become important workflows for large experiments

Modify our computing model and infrastructure (accelerators, HPC resources, ...)

27

Thanks!

Questions?

https://openlab.cern

https://openlab.cern/

28

Bonus Tracks

Quantum Machine Learning

Distributed Training

29

A quantum advantage for ML?

Quantum linear algebra is generally faster than classical counterpart

Some standard ML techniques estimate the ground state of Hamiltonians

ML algorithms have some tolerance to errors

Specific quantum techniques can be exploited to bring further improvement

29Biamonte et al. arxiv: 1611.09347

30

Quantum Support Vector Machine

Quantum SVM for ttH (H → 𝜸𝜸) classification

QSVM is simulated on IBM Qiskit simulator

Entanglement is used to encode relationships between features

Apply PCA to input data features

Reduced from 45 to 8,10 or 20 (limited by number of qubits)

Running full training with quantum simulators requires large

computing resources

Memory increases with qubit, training events and complexity

Quantum Machine Learning are among the first applications to be implemented on

near-term devices

30

31

Quantum GAN

Generative Adversarial Networks are among the most interesting models in classical machine learning

Quantum GAN would have more representational power than classical GAN

Different hybrid classical-quantum algorithms for generative models exist

i.e quantum Variational Auto-Encoders on D-Wave annealer

Train a quantum GAN to generate few-pixels images

Currently investigating two possible approaches:A hybrid schema with a quantum generator learning the target PDF using either a classical network or a variational quantum circuit as a discriminator (Variational Quantum Generator)

Full quantum adversarial implementation (quGAN)

31

quGAN

https://arxiv.org/pdf/1901.00848.pdf



32

Distributed training

• Frameworks• Horovod

• mpi_learn

• Hardware resources• Cloud

(HNSciCloud)

• HPC centers(Oakridge –TACC)

T-SystemsP100 CSCS

1.02.0

3.9

7.8

15.5

31

61

120

100% 100%98% 97% 97% 96% 95% 94%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1

2

4

8

16

32

64

128

256

1 2 4 8 16 32 64 128

Spee

dup

Effi

cien

cy

Spee

dup

Intel(R) 2S Xeon(R) Nodes

High Energy Physics: 3D GANs Training Speedup PerformanceIntel 2S Xeon(R) on Stampede2/TACC, OPA Fabric

TensorFlow 1.9+MKL-DNN+horovod, Intel MPI, Core Aff. BKMs, 4 Workers/Node

2S Xeon 8160: Secs/Epoch Speedup Ideal Scaling Efficiency

128-Node Perf:148 Secs/Epoch

• Cloud deployment via docker +

Kubernetes/Kubeflow (R. Rocha CERN IT-CM)

https://www.hnscicloud.eu/

https://www.hnscicloud.eu/

33

Deployment @LRZ

Re-packaged 3DGAN using Charliecloud

Submit to SuperMUC-NG via slurm

Scale up to 512 nodes

Distributed Tensorflow using Charliecloud containers

NodesTime(S)

per EpochLinear

4 3806 3806

8 1910 1903

16 1001 952

32 504 476

64 253 238

128 124 119

256 61 60

512 33 30

34

Some performance degradation

Mostly at low energy

Network optimised for the 100-

200 GeV central region

Applied warmup and scaling of

initial learning rate

Further investigation ongoing

Physics performance

Monte Carlo

BatchSize=1024

BatchSize=4096

BatchSize=10240

Fraction of

particle energy

deposited in

the calorimeter

35

Additional Information

36

Why? …New Challenges

CMS simulation

37

Generator generates data from random noise

Generator learning is supervised by the discriminatornetwork

37

Two networks competing with each other

Generative Adversarial Networks

arXiv:1406.2661

Image source

The forger/detective case

Forger shows its Monalisa to the detective

Detective says it is fake

Forger makes new Monalisa based on feedback

Iterate until detective is fooled

Arxiv:1701.00160

https://arxiv.org/abs/1406.2661v1


https://medium.com/@devnag/generative-adversarial-networks-gans-in-50-lines-of-code-pytorch-e81b79659e3f

38

Generative adversarial training

Assume a deterministic generator:

A prior over latent space:

Define a discriminator:

A learnable loss function from the min-max game

Equilibrium when Jensen-Shannon divergence between real and generated samples is minimized

39

Generative adversarial trainingGenerator is trained to maximize the probability of Discriminator making a mistake

39

arXiv:1406.2661v1

G and D don’t improve anymore.D is unable to differentiate

D is not an accurate classifier

D is trained to discriminate samples from data

D gradient guides G to regions more likely to be classified as data



40

Wasserstein GAN

Arjowski et al ’17, arxiv:1701.07875

Standard GAN loss formulation can lead to

• vanishing gradients when discriminator too powerful

• mode collapse (generating only a subset of the target distribution

Alternative Wasserstein metric ( from Earth Mover’s distance)

• solution may not be optimal due to biased gradients

γ «optimal transport plan»

Luke Metz, Ben Poole, David Pfau, and Jascha Sohl-Dickstein. 2016

Reformulate the loss using Cramer distance → remove bias on gradients → Cramer GAN

41

Cramer GAN

42

Other GAN flavors

Original GAN was based on MLP in 2014

Deep Convolutional GAN in 2015

Conditional GAN

Extended to learn a parameterized generator pmodel(x|θ);

Useful to obtain a single generator object for all θ configurations

Interpolate between distribution

Auxiliary Classifier GAN

D can assign a class to the image

Progressive growing GAN

Stack GAN

BiGAN ..

42arXiv:1610.0958arXiv: 1411.1784

https://arxiv.org/abs/1511.06434



43

Fast simulation in High Energy Physics

Monte Carlo simulation is a major workload in

terms of computing resources.

Generative Models are a generic approach to

replace expensive calculations

Inference is faster than Monte Carlo approach

Industry building highly optimized software,

hardware, and cloud services.

Numerous R&D activities (LHC and beyond)

43

MC - related

WLCG Wall Clock time for the ATLAS experiment

43

Time to create an electron shower

Method MachineTime/Shower

(msec)

Full Simulation (geant4)

Intel Xeon Platinum 8180

17000

3DGAN(batch size 128)

Intel Xeon Platinum 8160 (TF 1.12)

1

S.V., ACAT 2019

44

GANs for ATLAS LAr calorimeter

Train the generator against two critics

Wasserstein GAN model reproduces the mean energy distribution but not its widthAt convergence critic can’t see the difference in real and fake images anymore.

ATL-SOFT-PUB-2018-001

45

LHCb RICH fast simulationMaevskyi, ACAT2019

PID information encoded in log-likelihood differences (DLL) between particle type hypotheses

A fully connected layers GAN simulates DLLsInput track parameters and total number of tracks

Cramer distance to avoid gradients bias

Train on real data using sPlot* technique to extract signal distributions

*Pivk, Muriel et al. Nucl.Instrum.Meth.A 555 (2005) 356-369

AUCs differences

between few sigmas

46

CMS HGCAL prototype

Wasserstein conditional GAN (convolutions)

Train a generator against a critics and 2 constrainer networks reconstructing energy and impact point coordinates

Good agreement to Geant4Some problems at low energy

High Granularity and Hexagonal cells prototype for CMS upgrade

arxiv.org: 1807.01954

47

RICH DLLs

48

Condition training on input particle energy and incident angle, Custom losses

Auxiliary regression tasks assigned to the discriminator

3D convolutional GAN

Generator:

Discriminator:

49

3DGAN Generatedevents

Geant4 GAN

Geant4 GAN

Geant4 GAN

Ep=147 GeV, α= 88°Ep=189 GeV, α= 63°

Ep=111 GeV, α=115°

GAN generated electron shower

50

Structural Similarity Index (SSIM) [4] is used to assess similarity between

images

Tipically used in denoising applications

Measure diversity in GAN generated images

Sample diversityStructural Similarity Index

SSIM as training progresses

L G4 vs G4 GAN vs GAN

1 0.94 0.95

1e-2 0.21 0.251e-4 0.045 0.061

1e-6 0.045 0.051

E=150 GeV, orthogonal incident angle

51

AutoEncoders &

Variational AutoEncoders

AEs learn how to describe training

dataset in latent space

Data compression, dimensionality reduction

(PCA) and de-noising

Variational AEs have added

constraints on the encoded

representations

Learn latent model then sample from it

Many applications at the LHC

52

Other applications in fast simulation

Generative models for ALICE TPC simulation (ACAT2019)

Conditional Wasserstein GANs for fast simulation of electromagnetic showers in a CMS HGCAL prototype (IML WG 04/18)

Variational AutoEncoders to simulate ATLAS LAr calorimeter (PASC18)

Wasserstein GANs to generate high-level physics variables based on Monte Carlo ttH (superfast-simulation) (IML WG 04/18)

Particle-GAN for Full Event Simulation at the LHC (ACAT2019)

Refining Detector Simulation using Adversarial Networks (IML WG 04/

18)

Model-Assisted GANs for the optimisation of simulation parameters (IML WG 04/19)

53

HEP.TrkX : https://heptrkx.github.io/

Image-based approach

Image “segmentation” with LSTM+CNN

Image “captioning” with CNN+LSTM

CNN vertex finder to constrain seeding

Space-point representation

RNN hit predictor model

Graph Neural Networks

Quantum annealing!

Quadratic Unconstrained Binary Optimisation (QUBO) can

be mapped to an Ising Hamiltonian with change of variable

{0,1} →{-1,1}

A major challenge after LHC upgrade

More examples

TrackMLchallenge on kaggle

J. R. Vlimant, ACAT2019

https://heptrkx.github.io/

54

CERN openlab

54

Evaluate state-of-the-art technologies

in a challenging environment and

improve them

Test in a research environment today

what will be used in many business

sectors tomorrow

Training

Dissemination and outreach

A science – industry partnership to drive R&D and innovation

openlab.cern

openlab.cern

deep learning in high - copernicus...4 cern international organisation close to geneva, straddling...

Documents