distributed simulation with geant4
DESCRIPTION
Distributed Simulation with Geant4 Preliminary results of the LowE / DIANE joint project Jakub T. Moœcicki, CERN/IT credits also to: Alfonso Mantero, INFN Genova. History. Parallelization of Geant4 simulation is a joint project between Geant4 – DIANE – Anaphe - PowerPoint PPT PresentationTRANSCRIPT
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Distributed Simulation with Geant4
Preliminary results of the LowE / DIANE joint project
Jakub T. Moœcicki, CERN/ITcredits also to: Alfonso Mantero, INFN Genova
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
History
Parallelization of Geant4 simulation is a joint project between Geant4 – DIANE – Anaphe
DIANE is an R&D project in IT/API to study distributed analysis and simulation and create a prototype
initiated early 2001 with very limited resources
Anaphe is an analysis project supported by IT
provides the analysis framework for HEP
The pilot programme includes G4 simulation which produces AIDA/Anaphe histograms
Collaboration started late spring 2002
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Sequential Geant4 Simulation
the goal of simulation:
optimize the detectors used for x-ray fluorescence emission from Mercury's crust in the context of Hermes, Bepi Colombo ESA mission.
requires high statistics many events
20 Mio events ~ 3 hours
up to 100 Mio events might be useful
estimated time ~16 hours
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Parallel Geant4 Simulationincrease performance
shift from batch to semi-interactive simulation
speed up the analysis cycle
generate more events – debug simulation faster
from sequential to parallel simulation
preserve reproducability of the results
minimize deployment overhead
when moving from sequential to parallel simulation
both in terms of time and amout of code/expertise one must invest
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Benchmarking environmentparallel cluster configuration
lxplus: 70 redhat 61 nodes
7 Intel STL2 (2 x PIII 1GHz, 512MB)
31 ASUS P2B-D ( 2 x PIII 600MHz, 512MB)
15 Celsius 620 (2 x PIII, 550MHz, 512MB)
the rest – Kayak 450 Mhz (2 x PIII, 450Mhz, 128MB)
reference sequential machine
pcgeant2 (2x Xeon 1700Mhz, 1GB)
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Benchmarking Caveatnon-exclusive access to interactive machines
'load-noise' background, unpredictible load peaks
different CPU and RAM on nodes
AFS used to fetch physics config data
try to remove the noise:
repeat simulations many times to get the correct mean
work at night and off-peak hours (what about US people using CERN computing facilities ?)
etc...
conclusion:
results should be taken with caution and are approximate
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Structure of the simulation initialization phase (constant)
load ~10-15 Mb of physics tables, config data etc.
reference sequential machine: ~ 4 minutes (user time)
cluster nodes: ~ 5-6 minutes
beamOn ~ f( event number )
small job: 1-5 Mio events
medium job: 20-40 Mio events
big job: > 50 Mio events
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Benchmarking (comments)results are approximate
scaling factors for different CPU speeds
but seem with agreement with expectations
move from batch to semi interactive simulation feasible
small jobs do not gain so much – large constant initialization time
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Problems & solutionstime of job execution = slowest machine...
...or most loaded one at the moment
often had to wait a long time for last worker to finish
possible solution:
use larger number of smaller workers
fast machines run workers sequentially many times, but...
constant initialization time rather important
initialize once, beamOn many times... to be checked
if this problem is solved we may move towards more interactive simulation
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
From sequential to parallel simulation
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Reproducability initial seed of the random engine
make sure that every parallel simulation starts with a seed uniquely determined by the job's initial seed
number of times engine is used depends on the initial seed
make sure that correlations between the workers' seeds are avoided
our solution:
use two uncorrelated random engines
one to generate a table of initial seeds (one seed for each worker)
another for the simulation inside the worker
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Reproducabilityparameters which need to be fixed to reproduce the simulation:
total number of events
initial seed
... but also:
number of workers
number of events per worker
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Ease of use
user-friendliness
G4 simulation developer should not need to fight with irrelevant technical problems when moving from sequential to parallel G4 simulation
as non-intrusive as possible
minimize necessary code changes in original simulation
good separation of the subsystems
G4 simulation does not need to know that it runs in parallel...
the distributed framework (DIANE) does not need to care about what actually is being simulated (see #Slide 20)
Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 18
What is DIANE?What is DIANE?
R&D project in IT/API
semi-interactive parallel analysis for LHCmiddleware technology evaluation & choice
CORBA, MPI, Condor, LSF...also see how to integrate API products with GRID
prototyping (focus on ntuple analysis)
time scale and resources:
Jan 2001: start (< 1 FTE)June 2002: running prototype exists
sample Ntuple analysis with Anapheevent-level parallel Geant4 simulation
Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 19
What is DIANE?What is DIANE?
framework for parallel cluster computationapplication-oriented
master-worker model common in HEP applications
application-independentapps dynamically loaded in a plugin stylecallbacks to applications via abstract interfaces
component-basedsubsystems and services packaged into component librariescore architecture uses CORBA and CCM (CORBA Component Model )
integration layer between applications and the GRID
environment and deployment tools
Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 20
Master/Worker model
applications share the same computation modelso also share a big part of the framework codebut have different non-functional requirements
CPU vs IO intensive
semi-interactive vs batch etc....
Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 21
What DIANE is What DIANE is notnot
DIANE is nota replacement for a GRID and its servicesa hardwired analysis toolkit
Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 22
DIANE and GRID
DIANE as a GRID computing element...via a gateway that understands Grid/JDL
... Grid/JDL must be able to descibe parallel jobs/tasks
DIANE as a user of (low level) Grid services ...authentication, security, load balancing...
and profit from existing 3rd party implementations
python environment is a rapid prototyping platform and may provide a convinient connection between DIANE and Globus Toolkit via pyGlobus API
Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 23
Architecture Overview
layering: abstract middleware interfaces and components
plugin-style application loading
[email protected], DIANE Project
Geant4 Workshop, CERN Oct 2002
Conclusionsprototype deployment of G4-DIANE
significant performance improvement possible
scalability tests:
140 Mio Events
70 nodes in the cluster
1 hour total parallel execution
putting together DIANE and G4 is fairly easy
done in several days...
DIANE may bridge G4 to the GRID world
without necessarily waiting for fully-fledged GRID infrastructure to become available