[email protected], diane project seminar on innovative detectors, siena oct 2002 distributed...
TRANSCRIPT
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Distributed Computing in Physics
Parallel Geant4 Simulation in Medical and Space Science Applications
Jakub T. Moscicki, CERN/ITMaria G. Pia, INFN Genova
Alfonso Mantero, INFN GenovaSusanna Guatelli, INFN Genova
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Applications of Distributed Technology and GRID
Examples of interdisciplinary applicationsGeant4 simulation and analysis
speed-up factor ~ 30 times
DIANE R&D Project: application-oriented gateway to GRID
developed for LHC
CERN IT/API – INFN Geant4/LowEnergy collaboration
cern.ch/diane
LHC: ntuple analysis and simulationradiotherapy: brachytherapy, IMRTspace missions: ESA Bepi Colombo, LISA
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Why Distributed Computing?
share limited hardware resources
lend when not needed, borrow when needed
optimize load of CPUs
avoid redundancy: save common disk space
distributed collaborations e.g. LHC community
share and manage access to distributed data
replication, security, consistency
move processing close to available resources
e.g. data
process in parallel
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
What is GRID ?
global, unified resource access system
a la WWW: easy and universal access
virtual organisations over administrative boundaries
black-box: sumbit here, run anywhere
world of virtual happiness but...
in pratice to work efficiently and correctly every generic system must be customized to match specific experiment's needs and their configuration
technology in constant evolution
mature and universally accessible GRID still to come
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
DIstributed ANalysis Environmentparallel cluster processing
make fine tuning and customization easy
transparently using GRID technology
accessible via a Wide Area Network
application independent
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
DIstributed ANalysis Environmenthide complex details of underlying technology
easy to use
dedicated to master-worker modelmost of typical jobs: ntuple analysis, event level distributed simulation
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Preliminary Benchmark Results
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Standard Geant4 Simulation
the goal of simulation:
study the experimental configuration and the physics reach for Bepi Colombo ESA mission to Mercury
requires high statistics many events
20 Mio events ~ 3 hours
up to 100 Mio events might be useful
estimated time ~16 hours
analysis implemented with AIDA/Anaphe
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Distributed Geant4 Simulation
increase performance
shift from batch to semi-interactive simulation
user can study the results of the simulation faster and more often
generate more events – debug simulation faster
correctness and ease of use
preserve reproducability of the results
parallel should look as local to users
main goals:
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Benchmarking Environmentparallel cluster configuration
70 redhat 61 nodes
7 Intel STL2 (2 x PIII 1GHz, 512MB)
31 ASUS P2B-D ( 2 x PIII 600MHz, 512MB)
15 Celsius 620 (2 x PIII, 550MHz, 512MB)
the rest – Kayak 450 Mhz (2 x PIII, 450Mhz, 128MB)
reference sequential machine
pcgeant2 (2x Xeon 1700Mhz, 1GB)
notice different CPU speeds and memory size
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Scalability Test – Job Time
not normalized execution time: average gain 15 times
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Normalized Efficiency
normalized efficiency: average real gain ~30 times
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Benchmarking Commentarynon-exclusive access to interactive machines
'load-noise' background, unpredictible load peaks
different CPU and RAM on nodes
AFS used to fetch physics config data
try to remove the noise:
repeat simulations many times to get the correct mean
work at night and off-peak hours (what about US people using CERN computing facilities ?)
etc...
interpretation of results
scaling factors for different CPU speeds
results agree with expectations
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Summary
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Scalability Testsprototype deployment of Geant4-DIANE
proved significant performance improvement
scalability tests:
140 Mio Events
70 nodes in the cluster
1 hour total parallel execution
putting together DIANE and Geant4 is fairly easy
done in few days...
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Easy to use
user-friendliness
application developer (e.g. Geant4 simulation) is shielded from complexity of underlying technology
not affecting the original code of application
standalone and distributed cases is the same code
good separation of the subsystems
application does not need to know that it runs in distributed environment...
the distributed framework (DIANE) does not need to care about what actions application performs internally
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Universally ApplicableDIANE is application independent
easy to customize and use in applications other than Geant4
e.g. it has been originally developed for ntuple analysis
DIANE may bridge applications to the GRID world
without necessarily waiting for fully-fledged GRID infrastructure to become available
with smooth transition to GRID technologies as they become available
DIANE and distributed computing technology may be applied in a variety of other scientific/research domains
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
In progress: Optimizationtime of job execution = slowest machine...
...or most loaded one at the moment
often had to wait a long time for last worker to finish
example of customization
exploit dual-processor mode
use larger number of smaller workers
fast machines run workers sequentially many times
benchmark in dedicated cluster
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
In progress: Medical Applicationsplan to run Geant4 simulation for radiotherapy in couple of days
new possibilities:
precise MC-based treatment planning FAST
small hospitals may access distributed resources worldwide
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Referencesmore informarion:
cern.ch/diane
www.ge.infn.it/geant4/techtransf
aida.freehep.org
cern.ch/anaphe
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
The end
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
From sequential to parallel simulation
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Structure of the simulation initialization phase (constant)
load ~10-15 Mb of physics tables, config data etc.
reference sequential machine: ~ 4 minutes (user time)
cluster nodes: ~ 5-6 minutes
beamOn ~ f( event number )
small job: 1-5 Mio events
medium job: 20-40 Mio events
big job: > 50 Mio events
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Reproducability initial seed of the random engine
make sure that every parallel simulation starts with a seed uniquely determined by the job's initial seed
number of times engine is used depends on the initial seed
make sure that correlations between the workers' seeds are avoided
our solution:
use two uncorrelated random engines
one to generate a table of initial seeds (one seed for each worker)
another for the simulation inside the worker
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
DIANE – G4 prototype
Parallelization of Geant4 simulation is a joint project between Geant4 – DIANE – Anaphe
DIANE is an R&D project in IT/API to study distributed analysis and simulation and create a prototype
initiated early 2001 with very limited resources
Anaphe is an analysis project supported by IT
provides the analysis framework for HEP
The pilot programme includes G4 simulation which produces AIDA/Anaphe histograms
Collaboration started late spring 2002
[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002
Reproducabilityparameters which need to be fixed to reproduce the simulation:
total number of events
initial seed
... but also:
number of workers
number of events per worker