babar mc production babar mc production software farm @ vu (amsterdam university) a lot of computers...
TRANSCRIPT
![Page 1: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/1.jpg)
BaBar MC production
BaBar MCproduction software
Farm @ VU(Amsterdam University)
A lot of computers
EDG testbed(NIKHEF)Jobs
Results
The simple question: How can we run BaBar software on EDG grid sites?
![Page 2: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/2.jpg)
ParrotChirp
Introduction of Parrot
BaBar MCproduction software
Farm @ VU(Amsterdam University)
A lot of computers
EDG testbed(NIKHEF)Jobs
Results
We need transparent access to the Objectivity Database(requires local file access)
![Page 3: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/3.jpg)
Parrot functionalityBaBar
MC production
The Parrot Virtual File System
HTTP FTP RFIO NeST Chirp
LocalCache
HTTPServer
FTPServer
(POSIX Interface)
Whole File I/O(get/put)
Partial File I/O(open,close,read,write, lseek)
RFIOServer
NeSTServer
ChirpServer
CondorProxy
SecureRemote
RPC
CondorShadow
Integrationwith Castor
TraditionalI/O Services
Allocationand Mgmt
Full UNIXSemantics
Integrationwith Condor
(Ptrace trap)Not yet
x509
Optimize
![Page 4: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/4.jpg)
Private networkRelay
GCB
Parrot
Chirp
NF
S
The introduction of GCB
BaBar MCproduction software
Farm @ VU(Amsterdam University)
EDG testbed(NIKHEF)
Condor-G Jobs
Results
Some computers A lot of computers
Jobs
Results
![Page 5: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/5.jpg)
GCB functionality
GCBServer
CentralManage
r
A
B
P
Private network
Pers
iste
nt con
nect
ion
Relay
NAT
![Page 6: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/6.jpg)
PBS job manager72 hour jobs
Can’t wait for queues
Private network
NF
SBaBar MC
production softwareQueue
Batchjob
Condor-G Job
GlideIn
EDG testbed(NIKHEF)
RelayPrivate network
Relay
RelayParrot
Chirp
The introduction of GlideInFarm @ VU
(Amsterdam University)
Jobs
Results
Some computers A lot of computers
Jobs
Results
GCB
![Page 7: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/7.jpg)
GlideIn functionality
Job Submission Machine
Job Execution Site
Job
Condor-G GridManager
GASS Server
Condor-G Scheduler
Persistant Job Queue
End User Requests
Condor Shadow
Process for Job X
Condor-G Collector
Globus Daemons +
Local Site Scheduler
[See Figure 1]
Condor Daemons
Job X
Condor System Call
Trapping & Checkpoint Library
Resource
Information
Transfer Job X
Redirected
System Call Data
![Page 8: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/8.jpg)
Private network
PBS job manager72 hour jobs
Can’t wait for queues
Private network
NF
SBaBar MC
production softwareQueue
Batchjob
Condor-G Job
GlideIn
EDG testbed(NIKHEF)
Relay
Relay
RelayParrot
Chirp
Overview of complete setupFarm @ VU
(Amsterdam University)
Jobs
Results
Some computers A lot of computers
Jobs
Results
GCB
![Page 9: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/9.jpg)
PBS job manager
NF
SBaBar MC
production softwareQueueGlideIn
EDG testbed(NIKHEF)
Private networkPrivate network
Parrot
Chirp
Leave only the componentsFarm @ VU
(Amsterdam University)
Some computers A lot of computers
GCB
![Page 10: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/10.jpg)
PBS job manager
NF
SBaBar MC
production softwareQueueGlideIn
EDG testbed(NIKHEF)
Private networkPrivate network
Parrot
Chirp
The interesting dependenciesFarm @ VU
(Amsterdam University)
Some computers A lot of computers
GCB
NAT box
Different MDSscheme
Objectivity database• LOCK server sockets• NFS problems• UID / hostname checks
• Dropping UDP packages• Timeout 2 minutes
• Inactive sockets• Inactive File I/O
![Page 11: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/11.jpg)
Consequences
• Different MDS scheme– Implemented EDG scheme for GlideIn
• Objectivity– A lot of debugging– Made Parrot mimic hostname and uid– Tricked Objectivity to use standard NFS libraries
• Aggressive NAT box– Changed GCB to use TCP instead of UDP– Used Parrot to keep sockets alive– Parrot recovers File I/O when TCP connection is lost
• We are the first to run Objectivity cross-domain
![Page 12: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/12.jpg)
Performance
500 1000 1500 2000Events
Tim
e (m
inut
es)
500
1000
1500
2000
2500
3000
Application Initializes10 times slower
Production3 times slower
Production onlocal machine
Production onEDG testbed
![Page 13: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/13.jpg)
PBS job manager
NF
SBaBar MC
production softwareQueueGlideIn
EDG testbed(NIKHEF)
Private networkPrivate network
Parrot
Chirp
Possible improvementsFarm @ VU
(Amsterdam University)
Some computers A lot of computers
GCB Parrot: Caching• On per directory basis• Requires debugging
Create more sophisticated tool to acquire resources• Resource planning, distribution, etc.• Maybe something fancy already exists?
![Page 14: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/14.jpg)
PBS job manager
NF
SBaBar MC
production softwareQueueGlideIn
EDG testbed(NIKHEF)
Private networkPrivate network
ParrotChirp
Move chirp servers to private nodesFarm @ VU
(Amsterdam University)
Some computers A lot of computers
GCB
Use Condor/GCB machinery for chirp server• Solves security issues• Allows chirp server to be on private nodes• Requires new chirp-condor implementation
![Page 15: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/15.jpg)
PBS job manager
NF
SBaBar MC
production softwareQueueGlideIn
EDG testbed(NIKHEF)
Private networkPrivate network
ParrotChirp
Move GCB to head nodeFarm @ VU
(Amsterdam University)
Some computers A lot of computers
GCB
Move GCB to same machine as Central Manager• Solution required for port conflicts• Temporary solution: Move CM to a private node
![Page 16: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/16.jpg)
PBS job manager
NF
SBaBar MC
production softwareQueueGlideIn
EDG testbed(NIKHEF)
Private networkPrivate network
ParrotChirp
Use EDG data storageFarm @ VU
(Amsterdam University)
Some computers A lot of computers
GCB
EDG data storage
Write events to EDG data storage (gsiFTP)• Requires debugging
![Page 17: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/17.jpg)
PBS job manager
NF
SBaBar MC
production softwareQueueGlideIn
EDG testbed(NIKHEF)
Private networkPrivate network
ParrotChirp
Use more sites
Farm @ VU(Amsterdam University)
Some computers A lot of computers
GCB
Private network
A lot of computers
Other testbed
EDG data storage
Let GCB manage several private networks at the same time• Requires solution for conflicting private addresses
![Page 18: BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:](https://reader036.vdocuments.us/reader036/viewer/2022083007/56649e915503460f94b96472/html5/thumbnails/18.jpg)
Conclusions• It works– BaBar MC production runs successfully on NIKHEF EDG testbed– All this experimental software actually works when used together
• It looks easy– Our GRID setup is complicated, but….– Parrot hides problems related to local file access– GCB hides problems related to network configurations– GlideIn hides complications with resource gathering– The user can just submit his/her jobs to a local batch system
• There is some work to do– Performance could be better
• Initialization 10 times slower• Production 3 times slower
– Caching and (semi-) local event storage should improve this
– Usability could be improved• GlideIn should have a tool to acquire them• Several improvements proposed for GCB/Parrot
• The improvements are done at the level of the “grid” tools– The user benefits without rewriting code