report on vector prototype

13
Report on Vector Prototype J.Apostolakis, R.Brun, F.Carminati, A. Gheata 10 September 2012

Upload: kira

Post on 22-Mar-2016

45 views

Category:

Documents


1 download

DESCRIPTION

Report on Vector Prototype. J.Apostolakis , R.Brun , F.Carminati , A. Gheata 10 September 2012. Simple observation: HEP transport is mostly local !. Locality not exploited by the classical transportation approach Existing code very inefficient (0.6-0.8 IPC) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Report on Vector Prototype

Report on Vector Prototype

J.Apostolakis, R.Brun, F.Carminati, A. Gheata

10 September 2012

Page 2: Report on Vector Prototype

Simple observation: HEP transport is mostly local !

ATLAS volumes sorted by transport time. The same behavior is observed for most HEP geometries.

50 per cent of the time spent in 50/7100 volumes

• Locality not exploited by the classical transportation approach • Existing code very inefficient (0.6-0.8 IPC)• Cache misses due to fragmented code

Page 3: Report on Vector Prototype

A playground for new ideas• Simulation prototype in the attempt to explore parallelism and

efficiency issues– Basic idea: simple physics, realistic geometry: can we implement a parallel

transport model on threads exploiting data locality and vectorisation?– Clean re-design of data structures and steering to easily exploit parallel

architectures and allow for sharing all the main data structures among threads

– Can we make it fully non-blocking from generation to digitization and I/O ?• Events and primary tracks are independent

– Transport together a vector of tracks coming from many events– Study how does scattering/gathering of vectors of tracks and hits impact on

the simulation data flow• Start with toy physics to develop the appropriate data structures and

steering code– Keep in mind that the application should be eventually tuned based on

realistic numbers

Page 4: Report on Vector Prototype

Volume-oriented transport model

• We implemented a model where all particles traversing a given geometry volume are transported together as a vector until the volume gets empty– Same volume -> local (vs. global) geometry navigation, same material

and same cross sections– Load balancing: distribute all particles from a volume type into smaller

work units called baskets, give a basket to a transport thread at a time– Steering methods working with vectors, allowing for future auto-

vectorisation• Particles exiting a volume are distributed to baskets of the neighbor

volumes until exiting the setup or disappearing– Like a champagne cascade, but lower glasses can also fill top ones…– No direct communication between threads to avoid synchronization issues

Page 5: Report on Vector Prototype

Event injection

Realistic geometry + event generator

Inject event in the volume containing the IPMore events better to cut event tails and fill better the pipeline !

Page 6: Report on Vector Prototype

Transport model

Physics processes

Geometry transport

n

Event factory

n

1 1

0 0

nv nv

Track baskets (tracks in a single volume type)

Inject events into a track container taken by a

scheduler thread

Track containers (any volume)

The scheduler holds a track “basket” for each

logical volume. Tracks go to the right basket.

Baskets are filled up to a threshold, then injected

in a work queue

Transport threads pick baskets “first in first out”

Physics processes and geometry transport work

with vectors

Tracks are transported to boundaries, then dumped in a track

collection per thread

Track collections are pushed to a queue and picked by the scheduler

Sche

dule

r

Page 7: Report on Vector Prototype

Preliminary benchmarks

HT mode

Excellent CPU usage

Benchmarking 10+1 threads on a 12 core Xeon

Locks and waits: some overhead due to transitions coming from exchanging baskets via concurrent queues

Event re-injection will improve the speed-up

Page 8: Report on Vector Prototype

Evolution of populationsFlush events

0-4 5-9 95-99Transporting initial buffer

of events

"Priority" regime

Excellent concurrency

allover

Vectors "suffer" in this regime

Page 9: Report on Vector Prototype

Prototype implementation

transport

pick-upbaskets

transportable baskets

recycled baskets

full track collections

recycled track collections

Wor

ker t

hrea

ds

Disp

atch

& g

arba

ge

colle

ct th

read

Crossing tracks (itrack, ivolume)

Push/replacecollection

Main scheduler

0

1

2

3

4

5

6

7

8

n

Inject priority baskets

recyclebasket

ivolume

loop tracks and push to baskets

0

1

2

3

4

5

6

7

8

n

Stepping(tid, &tracks)

Digi

tize

& I/

O th

read

Priority baskets

Generate(Nevents)

Hits

Hits

Digitize(iev)

Disk

Inject/replace baskets

deque

deque

gene

rate

flush

Page 10: Report on Vector Prototype

Buffered events & re-injectionReusage of event

slots for re-injected events keeps memory

under control

Depletion regime with sparse tracks just few % of the

job

Concurrency still excellent

Vectors preserved

much better

Page 11: Report on Vector Prototype

Next steps• Include hits, digitization and I/O in the prototype

– Factories allowing contiguous pre-allocation and re-usage of user structures• MyHit *hit = HitFactory(MyHit::Class())->NextHit();• hit->SetP(particle->P());

• Introduce realistic EM physics– Tuning model parameters, better estimate memory requirements

• Re-design transport models from a “plug-in” perspective– E.g. ability to use fast simulation on per-track basis

• Look further to auto-vectorization options– New compiler features, Intel Cilk array notation, ...– Check impact of vector-friendly data restructuring

• Vector of objects -> object with vectors

• Push vectors lower level– Geometry and physics models as main candidates

• GPU is a great challenge for simulation code– Localize hot-spots with high CPU usage and low data transfer requirements– Test performance on data flow machines

Page 12: Report on Vector Prototype

Outlook• Existing simulation approaches perform badly on the new

computing infrastructures– Projecting this in future looks much worse...– The technology trends motivate serious R&D in the field

• We have started looking at the problem from different angles– Data locality and contiguity, fine-grain parallelism, data flow,

vectorization– Preliminary results confirming that the direction is good

• We need to understand what can be gained and how, what is the impact on the existing code, what are the challenges and effort to migrate to a new system…

Page 13: Report on Vector Prototype

Thank you!