dave morrison, chep, february 7, 2000 what we’re doing why we’re doing it what we’ve learned...

12
Dave Morrison, CHEP, February 7, 2000 • What we’re doing • Why we’re doing it • What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven National Laboratory

Upload: garey-barton

Post on 27-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

• What we’re doing

• Why we’re doing it

• What we’ve learned by doing it

PHENIX Offline Computing

David Morrison

Brookhaven National Laboratory

Page 2: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

a word from our sponsors ...

• large collaboration (>400 physicists)

• large, complex detector– ~300,000 channels– 11 different detector subsystems

• large volume of data, large number of events– 20 MB/sec for 9 months each year– 109 Au+Au events each year

• broad physics program– partly because RHIC itself is very flexible– Au+Au at 100+100 GeV/A, spin polarized p+p, and everything in-between– muons, electrons, hadrons, photons

Page 3: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

from the PHENIX photo album

DPM, in hardhat

Page 4: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

the eightfold way of PHENIX offline computing

• know your physics program– for PHENIX, event processing rather than event selection

• know your constraints– money, manpower ... and tape mounts

• avoid “not invented here” syndrome: beg, borrow, collaborate– doesn’t automatically imply use of commercial products

• focus on modularity, interfaces, abstract base classes

• viciously curtail variety of architecture/OS– Linux, Solaris

• data management and data access are really hard problems– don’t rely on fine-grained random access to 100’s of TB of data

• everyone has their favorite reference works...– Design Patterns (Gamma et al)

• run-time aggregation, shallow inheritance trees– The Mythical Man-Month (Brooks)

• avoid implementation by committee

Page 5: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

building blocks

• small group of “core” offline developers– M. Messer, K. Pope, M. Velkovsky, M. Purschke, D. Morrison, (M. Pollack)

• large number of computer-savvy subsystem physicists– recruitment via “help wanted” list of projects that need people

• PHENIX object-orented library, PHOOL (see talk by M. Messer)– object-oriented analysis framework

• analysis modules all share common interface– type-safe, flexible data manager

• extensive use of RTTI, avoids (void *) casts by users• ROOT I/O used for persistency

– “STL” operations on collection of modules or data nodes

• varied OO views on analysis framework design– ranging from passive data to “event, reconstruct thyself”– PHOOL follows a hybrid approach

• migrated to PHOOL from STAF in early 1999– no user code modified (~120,000 LOC)

Page 6: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

more blocks

• lots of physics-oriented objects in PHENIX code– geometry, address/index objects, track models, reconstruction

• file catalog– metadata management, tracks related files, tied in with run info DB

• “data carousel” for retrieving files from HPSS – retrieval seen as group-level activity (subsystems, physics working groups)– carousel optimizes file retrieval, mediates resource usage between groups– scripts on top of IBM-written batch system

• event display(s)– very much subsystem-centered efforts; all are ROOT-based– clearly valuable for algorithm development and debugging– value for PHENIX physics analysis much less clear

• GNU build system, Mozilla-derived recompilation (poster M. Velkovsky)– autoconf, automake, libtool, Bonsai, Tinderbox, etc.– capable, robust, widely used by large audience on variety of platforms – feedback loop for code development

Page 7: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

databases in PHENIX

• Objectivity used for “archival” database needs – Objy used in fairly “mainstream” manner

• all Objy DBs are resident online (not storing event data)– autonomous partitions, data replicated between counting house, RCF– RCF (D. Stampf) ported Objy to Linux

• PdbCal class library aimed at calibration DB application– insulates typical user from Objectivity– objects stored with validity period, versioning– usable interactively from within ROOT

• mySQL used for other database applications– Bonsai, Tinderbox system uses mySQL– heavily used in “data carousel”

Page 8: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

simplified data flow

Objectivity federated DB

calibrations& conditions

countinghouse

diskNFSdisk

analysisfarm

HPSS

Page 9: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

• subclasses of abstract “Eventiterator” class used to read raw data– from online pool, file, or fake test events - user code unchanged

• online control architecture based on CORBA “publish-subscribe”

• Java used in counting house for GUIs, CORBA

• subsystem reconstruction code uses STL, design patterns– not unusual to hear “singleton”, “iterator” at computing meetings

• OO emerging out of subsystems faster than from core offline crew

OO ubiquitous, mainstream in PHENIX

Page 10: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

OO experiences

• no Fortran in new post-simulation code – sidestepped many awkward F77/C++ issues, allowed OO to permeate

• loosely coupled, short hierarchy design working well – information localization on top of information encapsulation– allows decoupled, independent development

• no formal design tools, but lots of cloudy chalkboard diagrams– usually just a few interacting classes

• social engineering as important as software engineering– OO not science-fiction, not difficult ... and it’s here to stay– lots of hands-on examples, people are usually pleasantly surprised

Page 11: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

more OO experiences

• OO was oversold (not by us!) as a computing panacea– does make big computing problem tractable, not trivial– occasional need for internal “public-relations”

• cognizance of “distance” between concepts advocated by developers and those held by users

– e.g., CORBA IDL a great thing; tough to sell to collaboration at-large

• takes time and effort to “get it”, to move beyond “F77++”– general audience OO and C++ tutorials have helped– also work closely with someone from each subsystem - helps the OO

“meme” take hold

Page 12: Dave Morrison, CHEP, February 7, 2000 What we’re doing Why we’re doing it What we’ve learned by doing it PHENIX Offline Computing David Morrison Brookhaven

Dave Morrison, CHEP, February 7, 2000

summary

• PHENIX computing is essentially ready for physics data– use of PHOOL proven very successful during “mock data challenge”

• ObjectivityDB is primary database technology used throughout PHENIX

• reasonably conventional file-oriented data processing model

• loosely coupled, shallow hierarchy OO design– common approach across online and offline computing

• several approaches to recruiting, stretching scarce manpower– deliberate, explicit choice by collaboration to move to OO– recruit manpower from detector subsystems– loosely coupled OO design aids loosely coupled development

• OO has slowed implementation, but has been indispensable for design

• PHENIX will analyze physics data because of OO, not in spite of it