analysis framework for superkekb
TRANSCRIPT
Analysis Framework for SuperKEKB
R.Itoh (KEK), N.Katayama (KEK) and S.H.Lee (Korea U)
“Analysis Framework” - Accepts a set of functional “modules” which can be plugged in the framework and controls the execution sequence. - Integrated interface for the data persistency. + - Parallel processing capability (on SMP/Multi-core and PC cluster) - Single unified framework from DAQ to user analysis Belle: The same framework “BASF” is used in - Readout modules (COPPERs) and Readout PCs - Event builder and RFARM (HLT) - DST production / reproduction - User analysis
Definition of Analysis Framework (a la Belle)
Frameworktracking ECL
clustering vertex PID
analysis program
Plug-in modules
input data output data
Belle's framework : B.A.S.F.
* Software pipeline (software bus) architectureSoftware pipeline (software bus) architecture - Accepts a set of “plug-in” analysis modules using dynamic link - The modules are executed in the specified order to process event data. The execution is looped over all the events. * Data management using a home-grown memory managementData management using a home-grown memory management tool (Panther)tool (Panther): data portions of an event can be handled with user-defined tables; and HBOOK4HBOOK4 package for histogram/N-tuples.
mod. 1 mod. 2 mod. 3 mod. n................
Panther
event nevent 1..... event n
event 1.....B.A.S.F.
HBOOK4+CLHEP
input output
histogramfile
Input Data
Output Data
Module Pool
module1 module2
modules
module1 module2
pathPa
nthe
r
EventServer
OutputServer
Histo.Server
dynamic link
User Interface
B.A.S.F.KernelI/O
packa
ge
event process
parallel event processing on SMPparallel event processing on SMP
B.A.S.F.
* Green boxes are linked using dynamic link as well as “modules”.* Data handling is done through “Panther” package.* Separate package for histogram/N-tuple management : HBOOK4
Initialization
dynamic linkshared m
emshared m
em
developed in 1996
sock2rb: receive packet from socket and place it on ringbuf
rb2sock: pick a packet from ring buf and send it to socket
RingBuffer: Ring buffer on shared memory
rb2file: save data in a file
dBASF-II/RFARM
E3 sock2rb rb2sock
Ring Buffer
# of output increased +1
tape
sock2rb
event server (mod)
BASF
output server (mod)
rb2sock
sock2rb rb2file
Ring Bufferinput distributor output collector
processing nodes
EB farm
rb2sock sock2rb
Ring Buffer
Ring Buffer
Current problems with Belle's Framework
- No object persistency. * Cannot read/write “objects” from/to files using Panther ex. “Particle class” cannot be saved/retrieved....
- The interface for interactive user analysis is obsolete. * “HBOOK4” + CLHEP (for PAW/dis45) * Recent HEP community: “ROOT” is the de-facto standard.
- Complicated parallel processing framework. * Different implementations in SMP and network cluster.
- Lack of distributed input/output file management. * No integrated catalog (name service) management. * No integrated GRID interface (ad-hoc interface exists, though).
* We do need a new analysis framework for SuperKEKB/Belle based on ROOT with a GRID interface! -> However, the problem to go for the development is the lack of the man power ......
* BASF-like framework is useful to process a data set which contains massive data chunks of similar kinds.
* Recent experiments in other fields also need to process massive data. (Satellite image processing, for example)
How about collaborating with other experiment(s) to develop a new shareable framework?
Status as of 2007
Joint development with Hyper Suprime-Cam - Hyper Suprime-Cam project is aiming at a search for Dark Matter using “Subaru” telescope located at Mauna Kea in Hawaii . <- A new telescope camera “Hyper Suprime-Cam” is now being developed. The first light is expected in 2011. => Statistical analysis of “weak-lensing effect” in full sky survey images taken by Hyper Supreme CCD Camera. => Data size : 2GB/shot, >1TB/night
Observatory @ Mauna Kea
Subaru TelescopeHyper Supreme Cam
Planned data processing at HSC (before introduction of idea of framework) - Raw data : CCD images stored in FITS files (2048x4096/CCD x ~100)- The CCD data are processed by a chain of many “filter” applications driven by shell script.- One “filter” reads (a) FITS file(s) and write processing results in a different FITS file.
flt1 flt2 flt3 fltn
FITS files stored in file servers.
.....
* Possible pipe-line hazard if processing is done on many nodes.* Heavy I/O bandwidth
Use of HEP style analysis framework - Memory based data passing between filters: reduce I/O BW - “Image-by-image” basis parallel processing
Prototype of analysis framework : “roobasf”
- Software bus (pipeline) is kept.Software bus (pipeline) is kept. * Compatibility with modules written for B.A.S.F.
- Object persistency * ROOT I/O as the persistency core. * Panther I/O is kept as a legacy interface.
- More versatile parallel/pipeline processing scheme * Transparent implementation which utilizes both multicore CPUs and network clusters: Manage ~100 nodes by single framework * Dynamic optimization of resource allocation * Module pipelining over network - Integrated database and GRID interface for file management
- Dynamic customization of framework * replaceable I/O package, user interface .......
Requirements
Development strategy of roobasf
Step 1. Develop a prototype without parallel processing capability - Recycle existing BASF code as the core (module manipulation) - Implement ROOT I/O based object persistency
Step 2. Test the prototype for the upgraded Suprime-Cam at Subaru telescope => in progress ← as of today
Step 3. Implement parallel processing and finalize design - event-by-event (image-by-image) basis trivial parallel processing - parallel processing of pipeline modules (for “mosaicing”) - other issues : dynamic reconfig, database/GRID interface
SuperKEKB / Hyper Suprime-Cam
Subaru Observatory @ Mauna Kea
Subaru Telescope
“Summit system” for upgraded Suprime-Cam- Suprime-Cam is a CCD camera on the prime focus of Subaru telescope. (scale: ~1/10 of Hyper Suprime-Cam)- CCDs have been upgraded to new ones for the better performance.- Semi-real time data processing system based on “roobasf” has been implemented at Subaru observatory. -> good field test- The system has been first tested for the real observation on Aug.26. Development crew worked hard at the summit of Mauna Kea (4205m) then!
ROOT Object (on memory)
FITSfiles
for one CCD
readfitsmoduleson a pipeline
overscansub
roobasf
Summit System
GetParFlatField
AgpMask
Sext-ractor
Stathist.mgr.
N-tuple
SuprimeCamDAQ
* “roobasf”s are executed for 10 CCDs in parallel on 5 nodes.* The overall execution is controlled using “RCM”. RCM: “R&D chain management” - accepts a set of scripts - conditional execution of the seq. of scripts according to the given “workflow” - management of input/output using RDB
* FITS file handling - Originally FITS image was intended to be kept in ROOT object to pass them through memory. - Currently only the header of FITS are passed as objects. ← due to ad-hoc implementation of modules......
Analysis/visualization
usingROOT
x 10
web
outputFITS
* semi real time processing
Going furthera) Implementation of parallel processing
* “B.A.S.F.” style parallel processing = parallel processing using multiple processes on an SMP node with different event input is being implemented as the starting point.* The event data are distributed to processes from one or multiple event servers through either shared memory or TSocket.* The data are handled using TMessage class in ROOT.
Currently developing...* A data distribution class which can make use of both shared memory and socket transparently* A new class to manage the forking of event processes on SMP nodes and the invocation of framework at remote nodes via network.
RootIORootIO
RootIORootIO Event ServerEvent Server
Data Flow SchemeData Flow Scheme
ROOTROOTinputinput
TFileTFile TTreeTTree
TBranchTBranch BelleEventBelleEvent
TMessageTMessage
““Source”Source”SharedSharedMemoryMemory
““Output”Output”SharedSharedMemoryMemory
EventEventProcessorProcessor
Output Server Output Server
ROOTROOToutputoutput
BelleEventBelleEvent
TBranchTBranch
TTreeTTree TFileTFile
TFile->Get()
TTree->Get()
TBranch::GetEntry()
BelleEventBelleEventTMessage::WriteObject()
TFile->Write()
TTree->Fill()
char*char*
TMessage::Buffer()
EvtMessageEvtMessageEvtMessage::EvtMessage()
EvtMessageEvtMessage char*char*
TMessageTMessageTMessage::ReadObjectAny()
InMessage::InMessage()
BelleEventBelleEvent
S.H.Lee
ringbuffer
ringbuffer
Object-oriented data flow and processing in SuperKEKB DAQ
- The same software framework from COPPER to HLT (and offline).- ROOT:TMessage based data flow throughout DAQ -> class to contain streamed object- ROOT IO based object persistency at storage * use of “xrootd” for remote storage if enough bandwidth can be guaranteed.- Real-time collection of histograms from any node for monitoring.
roobasf
format
reduct.
roobasf
format
monitor
roobasftrackclust.
rxEVB
TMessage
pid.sel.
TMessage
Event B
uilderCOPPERs
(readout modules)
evtsrv.
TMessage
Rawdataclass
out.srv.
xrootdstogrageR/O PCs HLT nodes
b) Module-by-module parallel processing
- Module-by-module basis parallel processing is required in some cases. ex. “Mosaicing” at HSC - (Hyper)SuprimeCam consists of 10(100) CCDs. - The whole image is built by gathering all CCD images into one place <-> Just like “event building” in HEP - Boundary processing is necessary to have the image.
...
node n+11
...
...
...
...
Full mosaicing
A-1
B-1
C-1
CCDset A
CCDset B
CCDset C
Partial mosaicing
* Data exchange between frameworks outside data stream.* Synchronization between modules on different frameworks.* Possibility of pipeline hazard
Possibleparallel processing ofMosaicing(a la Belle DAQ)
c) Other issues
1) Dynamic customization of framework - Framework software is divided into small components: I/O package, module driver, command parser, user I/F..... as dynamic-linkable packages. (c.f. modules are already dynamic-linkable.) - Customize the framework by the building-block. Ex: Alternative object persistency to ROOT - object serializer supplied by BOOST project.
2) Database/GRID interface for file management - Use of “xrootd”
3) Dynamic resource allocation - Automatic addition/deletion of processing nodes to optimize the overall performance. -> Solution to the possible pipeline hazard
Plan (for SuperKEKB)
* Prototype called “roobasf” with - BASF's module driver - ROOT IO based object persistency - Event-by-event parallel processing capability on multi-core CPUs (SMP) - Histogram/N-tuple management with ROOT - Legacy interface : Panther I/O and HBOOK4
* Make the prototype work by the time of CHEP09 (~Mar. 20.) and release it to Belle for the field test. -> feed-back from the actual use in analyses.
Feature Survey for Running Experiments
YesTransient ->Many-tables
STAR
Stored in log files
Yes and dynamic load + mixture
Software bus
With script parsercustom
Transient ->custom
Belle
LimitedYes and dynamic load
Data on demand
With tclTransient ->custom
CLEO
Saved to data file
Yes and dynamic link
publish /subscribe
With text rcp files
Transient -> DOOMD0
Parameters captured DB
Yes but static link
Software bus
With tclTransient -> root TtreesCDF
Parameters captured DB
Yes but static link
Software bus
With tclTransient -> OODBBaBar
Provenance?Component Architecture?
Processing Model?
Configure?EDM? And persistency
L.S.Kennedy@CHEP06
Frameworks for LHC Era
Config. Based store in output
Yes, uses root::PluginManager
Software bus
cintrootAlice
Stored in output data + per event
Yes, uses seal::PluginManager
Software bus
With text files for now
Transient -> pool -> root Ttrees
CMS
Config. Based store in DM DB
Yes and dynamic link
Software bus
pythonTransient -> poolAtlas
Config. Based store in DM DB
Yes and dynamic link
Data on demand
Text fileTransient -> pool LHCb
Provenance?Component Architecture?
Processing Model?
Configure?EDM? And persistency
L.S.Kennedy@CHEP06
- “module”(BASF) = “algorithm” (GAUDI) , more versatile management- Algorithm and data are clearly separated - ROOT IO based object persistency (POOL)- ROOT based histogram/N-tuple management
GAUDI - Originally developed for LHCb - Used by ATLAS also as “Athena”
POOL
There is a package called “POOL”. - It was developed for LHC experiments and will be widely used for the data management by the experiments. - It is a combination of * ROOT I/O for the event data management, and * MySQL for the management of data catalog. -> also usable: XML DB, EDG (of GRID) - GRID capable! - Strong support by LHC community
http://pool.cern.ch
Why do we stick to our own framework? i.e. Why not GAUDI?
- Compatibility with existing Belle software is required. * We need to analyze Belle's 1/ab data together with SuperKEKB data. * Legacy interface : Panther and HBOOK4? * Module level compatibility at some level
- GAUDI is too complicated * Too heavy because of high functionality and versatility * Designed mainly by computer scientists and not necessarily easy to use for modest HEP physicists (might be my prejudice).
- GAUDI lacks the capability of parallel processing. External framework (like GRID batch) is necessary for it which makes the system complicated especially in the real time use. * Sophisticated parallel is desired to be built inside the framework whose advantage is already proven by BASF@Belle. i.e. Users can make use of multiple CPUs without knowing it.
Further discussion items for the use at SuperKEKB
1. Object persistency core- ROOT IO : as a default- BOOST product : experimental
* Replaceable I/O package by dynamic configuration of framework
2. Integrated GRID/database interface* Management of distributed files over WAN/GRID -> xrootd could be a nice candidate, or POOL as in GAUDI.* Parallel processing on GRID (GRID-MPI, for ex.)? -> Not necessary. Framework takes care of processing inside clusters with O(100) nodes. -> Processing with multiple clusters is managed by GRID batch.
3. Software bus - How to pass objects between modules * Pass pointer to object: simple but no control * Define input and output objects of each module: too strict * Access through proxy: moderate control but needs development - Algorithm to drive module execution : stream driven, object driven
4. Language: C++ is the default, but Python interface is required.
Relation between Framework and “Computing Model”
- Coupling of framework with “Computing Model” is basically in conjunction with the data access method.
- Remote “object” access: * POOL * SRM, SRB on GRID * xrootd * Combination As far as framework is capable of them, it can be independent of “computing model”.
- Parallel processing (in case of roobasf): * Framework takes care of processing within a single job utilizing local PC clusters consisting of up to ~100 nodes. * “Computing model” concerns the allocation of resources and job-by-job basis management.
Personal opinion on the framework @ SuperKEKB
1. The frameworks are required for a) DAQ data processing (real time processing) - Data reduction in front-end readout modules (COPPERs) - HLT processing with full reconstruction b) Offline data reproduction : full reconstruction with updated constants c) User analyses.
2. Belle uses the same single framework BASF for all of them. Having single framework at all levels has a great advantage in the easiness of software development. - “roobasf” is also being developed intending the same policy.
3. However, it may be possible to have different frameworks especially for user analyses, once the interface for the object persistency and module structure are clearly defined.
xrootd- Generalized model for the transparent access to distributed data files with name service.- Can be used as a substitution of NFS, but it is scalable and suited for wide-area access. (Just like P2P file sharing??)- Provides POSIX I/O interface which can be used from any applications transparently.- Originally developed to access scattered ROOT files and optimized for it, but the latest version is generalized for any types of data files.- Capable of accessing any data portion in scattered files transparently.- GRID capable combined with SRM and FUSE.
http://xrootd.slac.stanford.edu/
The Magic
xrootxrootdd
cmsdcmsd
xrootxrootdd
cmsdcmsd
xrootxrootdd
cmsdcmsd
Data Server Nodea.slac.stanford.edu
Manager Nodex.slac.stanford.edu
Data Server Nodeb.slac.stanford.edu
clientclient
open(“/foo”)
Locate /foo
Goto aopen(“/foo”)
/foo
Node a has /foo
Have /foo?Have /foo?
I have /foo!
Is Redirection!
Composite Name Space Implemented
Redirectorxrootd@myhost:1094
Name Spacexrootd@myhost:2094
Data Data ServersServers
ManagerManager
cnsdofs.forward 3way myhost:2094 mkdir mv rm rmdir truncofs.notify closew create |/opt/xrootd/bin/cnsdxrootd.redirect myhost:2094 dirlist
create/truncmkdir
mvrm
rmdir
opendir() refers to the directory structure maintained at myhost:2094
Client
opendir()
Not needed because redirector has access
Putting It All Together
xrootxrootdd
cmsdcmsd
xrootxrootdd
cmsdcmsd
Data Server Nodes Manager Node
SRM Node
BestManBestMan gridFTgridFTPP
xrootxrootdd
xrootdFSxrootdFS
Basic xrootdxrootd Cluster
+
Name Space xrootdxrootd
=
LHC Grid Access
cnsdcnsd
+
SRM Node(BestMan, xrootdFSxrootdFS, gridFTP)
+
cnsd