rod walker ic 13th march 2002 sam-grid middleware sam. jim. runjob. conclusions. - rod...
TRANSCRIPT
Rod Walker IC 13th March 2002
SAM-Grid Middleware
http://d0db.fnal.gov/sam
SAM.
JIM.
RunJob.
Conclusions.
- Rod Walker,ICL.
Rod Walker IC 13th March 2002
Rod Walker IC 13th March 2002
SAM stands for “Sequential Access to Data via Metadata”. Sequential access within files – order of files isn’t important, e.g. HEP data.
History of SAMProject started in 1997 by FNAL Computing Division(not just physicists).Meant for FNAL experiments, and recently taken up by CDF. So far ~20 FTE years – a lot of effort.
State of the art in Data ManagementNo-one else has tried to deliver TB’s of user selected data on demand.
Rod Walker IC 13th March 2002
Global file routing
• Many remote stations want files– SAM allowed free-for-all to gridftp server.
– MSS access only from FNAL site, cache on private network,...
• Needed control and routing
• Solution: All sites can route files, eg. – Get fnal files from fnal-router
– route=fnal.gov::nijmegen and nijmegen station has route=fnal.gov::fnal-router
• Janet - Geant – Esnet – FNAL, 155Mbit bottleneck.
• Janet - Geant – Surfnet – FNAL, Gbit(?)
Rod Walker IC 13th March 2002
SAM Status
•Middleware Development•Global routing.
•Diverse deployments, e.g. private network, firewall, shared vs local disk cache.
•CDF deployment – GridPP
•Bug fixes.
•GridFTP and Authentication – GridPP
•Outlook
• Decreasing development. FNAL CD support for RunII
Rod Walker IC 13th March 2002
Rod Walker IC 13th March 2002
JIM history
•Purpose: to build on SAM’s data handling, to create a real grid.
•Job definition & management•Information & Monitoring
•Novel concepts•Already have DH system.•ups/upd packaging and deployment.
•rpm functionality plus multi-platform, tailoring.•little dependence on native installation, e.g.python v2.1f•hugely simplified deployment.
•Use Condor as resource broker.
Rod Walker IC 13th March 2002
JIM components
• User Interface•Job Definition language based on classadds
• RB reduced to making MMS ranking function
•Static & dynamic constraints:os,code version,freecpu,…
•Plus external function to query DH system.
• Collaboration with Wisconsin.
•Choose gatekeeper, use external function, separate submission server from negotiator.
Rod Walker IC 13th March 2002
Rod Walker IC 13th March 2002
JIM components
•Information & Monitoring.
• Currently: grid sensors > ldap > MDS > PHP
• Developing: grid sensors > xml > native Db > PHP, other.
• Reliability, flexibility, persistency.
• Same model works for grid system book-keeping and user level monitoring.
Rod Walker IC 13th March 2002
Information FlowUser Interfac
e
User Interfac
e
Condor-G
InformationAnd
Monitoring
Gatekeeper
Batch Syestem
Grid Sensors
Compute Resource
GRAM
CondorNegotiator
CondorCollector
CondorGrid Manager
External Code
Execution Site
ParserParserJDLClassAd
ClassAd
CinCout
User Interfac
eParser
CondorScheddCondorSchedd
CondorSchedd
CondorCollector
CondorCollector
Grid Sensors
Grid Sensors
CondorNegotiator
CondorNegotiator
External Code
External Code
CondorGrid Manager
CondorGrid Manager
GatekeeperGatekeeper
Batch Syestem
Batch Syestem
Compute Resource
Compute Resource
Rod Walker IC 13th March 2002
RunJob
• Vital tool for d0 MC productions on farms.
•Chains, steers and parallelizes d0 executables. Creates metadata. Use SAM to store to MSS.
• Now interfaced to SAM for input, and can handle real data and any d0 executables.
•Will be used for skimming, re-processing datasets, and user analysis.
•Fully automate monitoring, checking and storage.
•Work underway by UK.
Rod Walker IC 13th March 2002
RunJob status
• Maintenance & development of RunJob, and interface to SAM-Grid entirely by UK.
• CMS using branch of RunJob for production.
• Dave Evans and Greg Graham collaborating on merging branches.
•Goal: Single package with EDG and SAM-Grid interfaces.
• Runjob “server” or job-manager.
Rod Walker IC 13th March 2002
SAM-Grid Logistics
SiteSite SiteSite SiteSite
Resource Selector
Info Collector
Info Gatherer
Match Making
User InterfaceUser Interface User InterfaceUser Interface
SubmissionGlobal Job Queue
Grid Client
SubmissionSubmission
User InterfaceUser Interface User InterfaceUser Interface
Global DH ServicesSAM Naming Server
SAM Log Server
Resource Optimizer
SAM DB ServerRC MetaData Catalog
Bookkeeping Service
SAM Stager(s)
SAM Station(+other servs)
Data Handling
Worker Nodes
Grid Gateway
Local Job Handler(CAF,RunJob,Vanilla, ...)
JIM Advertise
Local Job Handling
Cluster
AAA
Dist.FS
Info Manager
XML DB server
Site Conf.Glob/Loc JID map...
Info Providers
MDS
MSS Cache Site
Web ServGrid Monitoring
User Tools
Rod Walker IC 13th March 2002
Conclusions
o Core SAM supported by FNAL CDo Operational support via software shifts.o UK currently contributes 2 experts on shift.
o JIM post-development support,o bug fixing, deployment issues (like SAM).o will need software support shifts.
o RunJob is and will be UK supported.o Expanding functionality – analysis,reprocessing.o Increasing deployment – d0 sites, CMS.
o On target for end-March deliverable, and production Grid in April.
Rod Walker IC 13th March 2002
JIM V1: Package dependencies
jim_broker_client
xml_meta_configurator
sam_common
jim_info_providers
jim_broker
orbacus
sam_config
globus jim_www
server_run
jim_advertise
galax
samgrid
jim_client
jim_jobmanagers jim_sandbox