analysis framework for superkekb

Analysis Framework for SuperKEKB

R.Itoh (KEK), N.Katayama (KEK) and S.H.Lee (Korea U)

“Analysis Framework” - Accepts a set of functional “modules” which can be plugged in the framework and controls the execution sequence. - Integrated interface for the data persistency. + - Parallel processing capability (on SMP/Multi-core and PC cluster) - Single unified framework from DAQ to user analysis Belle: The same framework “BASF” is used in - Readout modules (COPPERs) and Readout PCs - Event builder and RFARM (HLT) - DST production / reproduction - User analysis

Definition of Analysis Framework (a la Belle)

Frameworktracking ECL

clustering vertex PID

analysis program

Plug-in modules

input data output data

Belle's framework : B.A.S.F.

* Software pipeline (software bus) architectureSoftware pipeline (software bus) architecture - Accepts a set of “plug-in” analysis modules using dynamic link - The modules are executed in the specified order to process event data. The execution is looped over all the events. * Data management using a home-grown memory managementData management using a home-grown memory management tool (Panther)tool (Panther): data portions of an event can be handled with user-defined tables; and HBOOK4HBOOK4 package for histogram/N-tuples.

mod. 1 mod. 2 mod. 3 mod. n................

Panther

event nevent 1..... event n

event 1.....B.A.S.F.

HBOOK4+CLHEP

input output

histogramfile

Input Data

Output Data

Module Pool

module1 module2

modules

module1 module2

pathPa

nthe

r

EventServer

OutputServer

Histo.Server

dynamic link

User Interface

B.A.S.F.KernelI/O

packa

ge

event process

parallel event processing on SMPparallel event processing on SMP

B.A.S.F.

* Green boxes are linked using dynamic link as well as “modules”.* Data handling is done through “Panther” package.* Separate package for histogram/N-tuple management : HBOOK4

Initialization

dynamic linkshared m

emshared m

em

developed in 1996

sock2rb: receive packet from socket and place it on ringbuf

rb2sock: pick a packet from ring buf and send it to socket

RingBuffer: Ring buffer on shared memory

rb2file: save data in a file

dBASF-II/RFARM

E3 sock2rb rb2sock

Ring Buffer

# of output increased +1

tape

sock2rb

event server (mod)

BASF

output server (mod)

rb2sock

sock2rb rb2file

Ring Bufferinput distributor output collector

processing nodes

EB farm

rb2sock sock2rb

Ring Buffer

Ring Buffer

Current problems with Belle's Framework

- No object persistency. * Cannot read/write “objects” from/to files using Panther ex. “Particle class” cannot be saved/retrieved....

- The interface for interactive user analysis is obsolete. * “HBOOK4” + CLHEP (for PAW/dis45) * Recent HEP community: “ROOT” is the de-facto standard.

- Complicated parallel processing framework. * Different implementations in SMP and network cluster.

- Lack of distributed input/output file management. * No integrated catalog (name service) management. * No integrated GRID interface (ad-hoc interface exists, though).

* We do need a new analysis framework for SuperKEKB/Belle based on ROOT with a GRID interface! -> However, the problem to go for the development is the lack of the man power ......

* BASF-like framework is useful to process a data set which contains massive data chunks of similar kinds.

* Recent experiments in other fields also need to process massive data. (Satellite image processing, for example)

How about collaborating with other experiment(s) to develop a new shareable framework?

Status as of 2007

Joint development with Hyper Suprime-Cam - Hyper Suprime-Cam project is aiming at a search for Dark Matter using “Subaru” telescope located at Mauna Kea in Hawaii . <- A new telescope camera “Hyper Suprime-Cam” is now being developed. The first light is expected in 2011. => Statistical analysis of “weak-lensing effect” in full sky survey images taken by Hyper Supreme CCD Camera. => Data size : 2GB/shot, >1TB/night

Observatory @ Mauna Kea

Subaru TelescopeHyper Supreme Cam

Planned data processing at HSC (before introduction of idea of framework) - Raw data : CCD images stored in FITS files (2048x4096/CCD x ~100)- The CCD data are processed by a chain of many “filter” applications driven by shell script.- One “filter” reads (a) FITS file(s) and write processing results in a different FITS file.

flt1 flt2 flt3 fltn

FITS files stored in file servers.

.....

* Possible pipe-line hazard if processing is done on many nodes.* Heavy I/O bandwidth

Use of HEP style analysis framework - Memory based data passing between filters: reduce I/O BW - “Image-by-image” basis parallel processing

Prototype of analysis framework : “roobasf”

- Software bus (pipeline) is kept.Software bus (pipeline) is kept. * Compatibility with modules written for B.A.S.F.

- Object persistency * ROOT I/O as the persistency core. * Panther I/O is kept as a legacy interface.

- More versatile parallel/pipeline processing scheme * Transparent implementation which utilizes both multicore CPUs and network clusters: Manage ~100 nodes by single framework * Dynamic optimization of resource allocation * Module pipelining over network - Integrated database and GRID interface for file management

- Dynamic customization of framework * replaceable I/O package, user interface .......

Requirements

Development strategy of roobasf

Step 1. Develop a prototype without parallel processing capability - Recycle existing BASF code as the core (module manipulation) - Implement ROOT I/O based object persistency

Step 2. Test the prototype for the upgraded Suprime-Cam at Subaru telescope => in progress ← as of today

Step 3. Implement parallel processing and finalize design - event-by-event (image-by-image) basis trivial parallel processing - parallel processing of pipeline modules (for “mosaicing”) - other issues : dynamic reconfig, database/GRID interface

SuperKEKB / Hyper Suprime-Cam

Subaru Observatory @ Mauna Kea

Subaru Telescope

“Summit system” for upgraded Suprime-Cam- Suprime-Cam is a CCD camera on the prime focus of Subaru telescope. (scale: ~1/10 of Hyper Suprime-Cam)- CCDs have been upgraded to new ones for the better performance.- Semi-real time data processing system based on “roobasf” has been implemented at Subaru observatory. -> good field test- The system has been first tested for the real observation on Aug.26. Development crew worked hard at the summit of Mauna Kea (4205m) then!

ROOT Object (on memory)

FITSfiles

for one CCD

readfitsmoduleson a pipeline

overscansub

roobasf

Summit System

GetParFlatField

AgpMask

Sext-ractor

Stathist.mgr.

N-tuple

SuprimeCamDAQ

* “roobasf”s are executed for 10 CCDs in parallel on 5 nodes.* The overall execution is controlled using “RCM”. RCM: “R&D chain management” - accepts a set of scripts - conditional execution of the seq. of scripts according to the given “workflow” - management of input/output using RDB

* FITS file handling - Originally FITS image was intended to be kept in ROOT object to pass them through memory. - Currently only the header of FITS are passed as objects. ← due to ad-hoc implementation of modules......

Analysis/visualization

usingROOT

x 10

web

outputFITS

* semi real time processing

Going furthera) Implementation of parallel processing

* “B.A.S.F.” style parallel processing = parallel processing using multiple processes on an SMP node with different event input is being implemented as the starting point.* The event data are distributed to processes from one or multiple event servers through either shared memory or TSocket.* The data are handled using TMessage class in ROOT.

Currently developing...* A data distribution class which can make use of both shared memory and socket transparently* A new class to manage the forking of event processes on SMP nodes and the invocation of framework at remote nodes via network.

RootIORootIO

RootIORootIO Event ServerEvent Server

Data Flow SchemeData Flow Scheme

ROOTROOTinputinput

TFileTFile TTreeTTree

TBranchTBranch BelleEventBelleEvent

TMessageTMessage

““Source”Source”SharedSharedMemoryMemory

““Output”Output”SharedSharedMemoryMemory

EventEventProcessorProcessor

Output Server Output Server

ROOTROOToutputoutput

BelleEventBelleEvent

TBranchTBranch

TTreeTTree TFileTFile

TFile->Get()

TTree->Get()

TBranch::GetEntry()

BelleEventBelleEventTMessage::WriteObject()

TFile->Write()

TTree->Fill()

char*char*

TMessage::Buffer()

EvtMessageEvtMessageEvtMessage::EvtMessage()

EvtMessageEvtMessage char*char*

TMessageTMessageTMessage::ReadObjectAny()

InMessage::InMessage()

BelleEventBelleEvent

S.H.Lee

ringbuffer

ringbuffer

Object-oriented data flow and processing in SuperKEKB DAQ

- The same software framework from COPPER to HLT (and offline).- ROOT:TMessage based data flow throughout DAQ -> class to contain streamed object- ROOT IO based object persistency at storage * use of “xrootd” for remote storage if enough bandwidth can be guaranteed.- Real-time collection of histograms from any node for monitoring.

roobasf

format

reduct.

roobasf

format

monitor

roobasftrackclust.

rxEVB

TMessage

pid.sel.

TMessage

Event B

uilderCOPPERs

(readout modules)

evtsrv.

TMessage

Rawdataclass

out.srv.

xrootdstogrageR/O PCs HLT nodes

b) Module-by-module parallel processing

- Module-by-module basis parallel processing is required in some cases. ex. “Mosaicing” at HSC - (Hyper)SuprimeCam consists of 10(100) CCDs. - The whole image is built by gathering all CCD images into one place <-> Just like “event building” in HEP - Boundary processing is necessary to have the image.

...

node n+11

...

...

...

...

Full mosaicing

A-1

B-1

C-1

CCDset A

CCDset B

CCDset C

Partial mosaicing

* Data exchange between frameworks outside data stream.* Synchronization between modules on different frameworks.* Possibility of pipeline hazard

Possibleparallel processing ofMosaicing(a la Belle DAQ)

c) Other issues

1) Dynamic customization of framework - Framework software is divided into small components: I/O package, module driver, command parser, user I/F..... as dynamic-linkable packages. (c.f. modules are already dynamic-linkable.) - Customize the framework by the building-block. Ex: Alternative object persistency to ROOT - object serializer supplied by BOOST project.

2) Database/GRID interface for file management - Use of “xrootd”

3) Dynamic resource allocation - Automatic addition/deletion of processing nodes to optimize the overall performance. -> Solution to the possible pipeline hazard

Plan (for SuperKEKB)

* Prototype called “roobasf” with - BASF's module driver - ROOT IO based object persistency - Event-by-event parallel processing capability on multi-core CPUs (SMP) - Histogram/N-tuple management with ROOT - Legacy interface : Panther I/O and HBOOK4

* Make the prototype work by the time of CHEP09 (~Mar. 20.) and release it to Belle for the field test. -> feed-back from the actual use in analyses.

Feature Survey for Running Experiments

YesTransient ->Many-tables

STAR

Stored in log files

Yes and dynamic load + mixture

Software bus

With script parsercustom

Transient ->custom

Belle

LimitedYes and dynamic load

Data on demand

With tclTransient ->custom

CLEO

Saved to data file

Yes and dynamic link

publish /subscribe

With text rcp files

Transient -> DOOMD0

Parameters captured DB

Yes but static link

Software bus

With tclTransient -> root TtreesCDF

Parameters captured DB

Yes but static link

Software bus

With tclTransient -> OODBBaBar

Provenance?Component Architecture?

Processing Model?

Configure?EDM? And persistency

L.S.Kennedy@CHEP06

Frameworks for LHC Era

Config. Based store in output

Yes, uses root::PluginManager

Software bus

cintrootAlice

Stored in output data + per event

Yes, uses seal::PluginManager

Software bus

With text files for now

Transient -> pool -> root Ttrees

CMS

Config. Based store in DM DB


Software bus

pythonTransient -> poolAtlas

Config. Based store in DM DB


Data on demand

Text fileTransient -> pool LHCb

Provenance?Component Architecture?

Processing Model?

Configure?EDM? And persistency

L.S.Kennedy@CHEP06

- “module”(BASF) = “algorithm” (GAUDI) , more versatile management- Algorithm and data are clearly separated - ROOT IO based object persistency (POOL)- ROOT based histogram/N-tuple management

GAUDI - Originally developed for LHCb - Used by ATLAS also as “Athena”

POOL

There is a package called “POOL”. - It was developed for LHC experiments and will be widely used for the data management by the experiments. - It is a combination of * ROOT I/O for the event data management, and * MySQL for the management of data catalog. -> also usable: XML DB, EDG (of GRID) - GRID capable! - Strong support by LHC community

http://pool.cern.ch

Why do we stick to our own framework? i.e. Why not GAUDI?

- Compatibility with existing Belle software is required. * We need to analyze Belle's 1/ab data together with SuperKEKB data. * Legacy interface : Panther and HBOOK4? * Module level compatibility at some level

- GAUDI is too complicated * Too heavy because of high functionality and versatility * Designed mainly by computer scientists and not necessarily easy to use for modest HEP physicists (might be my prejudice).

- GAUDI lacks the capability of parallel processing. External framework (like GRID batch) is necessary for it which makes the system complicated especially in the real time use. * Sophisticated parallel is desired to be built inside the framework whose advantage is already proven by BASF@Belle. i.e. Users can make use of multiple CPUs without knowing it.

mailto:BASF@Belle

Further discussion items for the use at SuperKEKB

1. Object persistency core- ROOT IO : as a default- BOOST product : experimental

* Replaceable I/O package by dynamic configuration of framework

2. Integrated GRID/database interface* Management of distributed files over WAN/GRID -> xrootd could be a nice candidate, or POOL as in GAUDI.* Parallel processing on GRID (GRID-MPI, for ex.)? -> Not necessary. Framework takes care of processing inside clusters with O(100) nodes. -> Processing with multiple clusters is managed by GRID batch.

3. Software bus - How to pass objects between modules * Pass pointer to object: simple but no control * Define input and output objects of each module: too strict * Access through proxy: moderate control but needs development - Algorithm to drive module execution : stream driven, object driven

4. Language: C++ is the default, but Python interface is required.

Relation between Framework and “Computing Model”

- Coupling of framework with “Computing Model” is basically in conjunction with the data access method.

- Remote “object” access: * POOL * SRM, SRB on GRID * xrootd * Combination As far as framework is capable of them, it can be independent of “computing model”.

- Parallel processing (in case of roobasf): * Framework takes care of processing within a single job utilizing local PC clusters consisting of up to ~100 nodes. * “Computing model” concerns the allocation of resources and job-by-job basis management.

Personal opinion on the framework @ SuperKEKB

1. The frameworks are required for a) DAQ data processing (real time processing) - Data reduction in front-end readout modules (COPPERs) - HLT processing with full reconstruction b) Offline data reproduction : full reconstruction with updated constants c) User analyses.

2. Belle uses the same single framework BASF for all of them. Having single framework at all levels has a great advantage in the easiness of software development. - “roobasf” is also being developed intending the same policy.

3. However, it may be possible to have different frameworks especially for user analyses, once the interface for the object persistency and module structure are clearly defined.

xrootd- Generalized model for the transparent access to distributed data files with name service.- Can be used as a substitution of NFS, but it is scalable and suited for wide-area access. (Just like P2P file sharing??)- Provides POSIX I/O interface which can be used from any applications transparently.- Originally developed to access scattered ROOT files and optimized for it, but the latest version is generalized for any types of data files.- Capable of accessing any data portion in scattered files transparently.- GRID capable combined with SRM and FUSE.

http://xrootd.slac.stanford.edu/

The Magic

xrootxrootdd

cmsdcmsd

xrootxrootdd

cmsdcmsd

xrootxrootdd

cmsdcmsd

Data Server Nodea.slac.stanford.edu

Manager Nodex.slac.stanford.edu

Data Server Nodeb.slac.stanford.edu

clientclient

open(“/foo”)

Locate /foo

Goto aopen(“/foo”)

/foo

Node a has /foo

Have /foo?Have /foo?

I have /foo!

Is Redirection!

Composite Name Space Implemented

Redirectorxrootd@myhost:1094

Name Spacexrootd@myhost:2094

Data Data ServersServers

ManagerManager

cnsdofs.forward 3way myhost:2094 mkdir mv rm rmdir truncofs.notify closew create |/opt/xrootd/bin/cnsdxrootd.redirect myhost:2094 dirlist

create/truncmkdir

mvrm

rmdir

opendir() refers to the directory structure maintained at myhost:2094

Client

opendir()

Not needed because redirector has access

Putting It All Together

xrootxrootdd

cmsdcmsd

xrootxrootdd

cmsdcmsd

Data Server Nodes Manager Node

SRM Node

BestManBestMan gridFTgridFTPP

xrootxrootdd

xrootdFSxrootdFS

Basic xrootdxrootd Cluster

+

Name Space xrootdxrootd

=

LHC Grid Access

cnsdcnsd

+

SRM Node(BestMan, xrootdFSxrootdFS, gridFTP)

+

cnsd

analysis framework for superkekb

Documents