what we’re doing why we’re doing it what we’ve learned by doing it

12
Dave Morrison, CHEP, February 7, 2000 • What we’re doing • Why we’re doing it • What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven National Laboratory

Upload: denver

Post on 05-Jan-2016

48 views

Category:

Documents


3 download

DESCRIPTION

PHENIX Offline Computing. David Morrison Brookhaven National Laboratory. What we’re doing Why we’re doing it What we’ve learned by doing it. a word from our sponsors. large collaboration (>400 physicists) large, complex detector ~300,000 channels 11 different detector subsystems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

• What we’re doing

• Why we’re doing it

• What we’ve learned by doing it

PHENIX Offline Computing

David Morrison

Brookhaven National Laboratory

Page 2: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

a word from our sponsors ...

• large collaboration (>400 physicists)

• large, complex detector– ~300,000 channels– 11 different detector subsystems

• large volume of data, large number of events– 20 MB/sec for 9 months each year– 109 Au+Au events each year

• broad physics program– partly because RHIC itself is very flexible– Au+Au at 100+100 GeV/A, spin polarized p+p, and everything in-between– muons, electrons, hadrons, photons

Page 3: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

from the PHENIX photo album

DPM, in hardhat

Page 4: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

the eightfold way of PHENIX offline computing

• know your physics program– for PHENIX, event processing rather than event selection

• know your constraints– money, manpower ... and tape mounts

• avoid “not invented here” syndrome: beg, borrow, collaborate– doesn’t automatically imply use of commercial products

• focus on modularity, interfaces, abstract base classes

• viciously curtail variety of architecture/OS– Linux, Solaris

• data management and data access are really hard problems– don’t rely on fine-grained random access to 100’s of TB of data

• everyone has their favorite reference works...– Design Patterns (Gamma et al)

• run-time aggregation, shallow inheritance trees– The Mythical Man-Month (Brooks)

• avoid implementation by committee

Page 5: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

building blocks

• small group of “core” offline developers– M. Messer, K. Pope, M. Velkovsky, M. Purschke, D. Morrison, (M. Pollack)

• large number of computer-savvy subsystem physicists– recruitment via “help wanted” list of projects that need people

• PHENIX object-orented library, PHOOL (see talk by M. Messer)– object-oriented analysis framework

• analysis modules all share common interface– type-safe, flexible data manager

• extensive use of RTTI, avoids (void *) casts by users• ROOT I/O used for persistency

– “STL” operations on collection of modules or data nodes

• varied OO views on analysis framework design– ranging from passive data to “event, reconstruct thyself”– PHOOL follows a hybrid approach

• migrated to PHOOL from STAF in early 1999– no user code modified (~120,000 LOC)

Page 6: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

more blocks

• lots of physics-oriented objects in PHENIX code– geometry, address/index objects, track models, reconstruction

• file catalog– metadata management, tracks related files, tied in with run info DB

• “data carousel” for retrieving files from HPSS – retrieval seen as group-level activity (subsystems, physics working groups)– carousel optimizes file retrieval, mediates resource usage between groups– scripts on top of IBM-written batch system

• event display(s)– very much subsystem-centered efforts; all are ROOT-based– clearly valuable for algorithm development and debugging– value for PHENIX physics analysis much less clear

• GNU build system, Mozilla-derived recompilation (poster M. Velkovsky)– autoconf, automake, libtool, Bonsai, Tinderbox, etc.– capable, robust, widely used by large audience on variety of platforms – feedback loop for code development

Page 7: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

databases in PHENIX

• Objectivity used for “archival” database needs – Objy used in fairly “mainstream” manner

• all Objy DBs are resident online (not storing event data)– autonomous partitions, data replicated between counting house, RCF– RCF (D. Stampf) ported Objy to Linux

• PdbCal class library aimed at calibration DB application– insulates typical user from Objectivity– objects stored with validity period, versioning– usable interactively from within ROOT

• mySQL used for other database applications– Bonsai, Tinderbox system uses mySQL– heavily used in “data carousel”

Page 8: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

simplified data flow

Objectivity federated DB

calibrations& conditions

countinghouse

diskNFSdisk

analysisfarm

HPSS

Page 9: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

• subclasses of abstract “Eventiterator” class used to read raw data– from online pool, file, or fake test events - user code unchanged

• online control architecture based on CORBA “publish-subscribe”

• Java used in counting house for GUIs, CORBA

• subsystem reconstruction code uses STL, design patterns– not unusual to hear “singleton”, “iterator” at computing meetings

• OO emerging out of subsystems faster than from core offline crew

OO ubiquitous, mainstream in PHENIX

Page 10: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

OO experiences

• no Fortran in new post-simulation code – sidestepped many awkward F77/C++ issues, allowed OO to permeate

• loosely coupled, short hierarchy design working well – information localization on top of information encapsulation– allows decoupled, independent development

• no formal design tools, but lots of cloudy chalkboard diagrams– usually just a few interacting classes

• social engineering as important as software engineering– OO not science-fiction, not difficult ... and it’s here to stay– lots of hands-on examples, people are usually pleasantly surprised

Page 11: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

more OO experiences

• OO was oversold (not by us!) as a computing panacea– does make big computing problem tractable, not trivial– occasional need for internal “public-relations”

• cognizance of “distance” between concepts advocated by developers and those held by users

– e.g., CORBA IDL a great thing; tough to sell to collaboration at-large

• takes time and effort to “get it”, to move beyond “F77++”– general audience OO and C++ tutorials have helped– also work closely with someone from each subsystem - helps the OO

“meme” take hold

Page 12: What we’re doing   Why we’re doing it  What we’ve learned by doing it

Dave Morrison, CHEP, February 7, 2000

summary

• PHENIX computing is essentially ready for physics data– use of PHOOL proven very successful during “mock data challenge”

• ObjectivityDB is primary database technology used throughout PHENIX

• reasonably conventional file-oriented data processing model

• loosely coupled, shallow hierarchy OO design– common approach across online and offline computing

• several approaches to recruiting, stretching scarce manpower– deliberate, explicit choice by collaboration to move to OO– recruit manpower from detector subsystems– loosely coupled OO design aids loosely coupled development

• OO has slowed implementation, but has been indispensable for design

• PHENIX will analyze physics data because of OO, not in spite of it