The OxGrid Resource Broker
David Wallom
Overview
• OxGrid• Resource Broking• Why build our own• Job Submission and other tools• Future developments
OxGrid, a University Campus Grid
• Single entry point for users to shared and dedicated resources
• Seamless access to NGS and OSC for registered users
Resource Broking
• The original idea of the grid relied on efficient resource broking to abstract the user away from the resources
• This has been significantly neglected by grid software developers– Push or pull type of mechanism, each have
significant advantages or disadvantages– Resources that have multiple job sources
increase complexity many fold
Why build our own?• OxGrid is intended to be a lightweight
development• Replacement of individual components should be
simple– Use of service based interfaces are the goal
• Current solutions do not allow this with massive dependencies and non trivial maintenance requirements
• Condor-G is a simple off the shelf Grid system meta scheduler, why make it so much more complicated?
Condor Matchmaking
• Matchmaking is a methodology for Distributed Resource Management
• Conceptually simple:– Service providers and requesters advertise– Compatible advertisements are matched– Matched entities cooperate to perform service
• Developed for opportunistic environments– Use resources as and when available
Thanks to the Miron and the Condor Team
Condor Matchmaking (Cont.)
• Customers and Servers advertise to a Matchmaking Service
• Advertisements describe advertising entities– Characteristics– Requirements and Constraints– Preferences
• These descriptions are called classified advertisements (classads)
Thanks to the Miron and the Condor Team
Static and Dynamic Information
• Static information– e.g. processor architecture, physical memory,
operating system, scheduling system, no. of nodes• Dynamic information
– e.g. system availability, scheduler load, queue length, used disk or memory
OxGrid Virtual Organisation Manager Database
• Final repository for authorisation information
• Stores additional static information for each resource such as capability and maximum number of submitted jobs for that node
Data Harvesting cycle
• Information sources can be added or removed at will
• Either a single repository for information aggregation (e.g. ngsinfo) or individual machines
• Simple internal representation of information gives ease of adding new types of info source
Generated classadMyType = "Machine"TargetType = "Job"Name = ”bedrock.oucs.ox.ac.uk-condor“gatekeeper_url=”bedrock.oucs.ox.ac.uk/jobmanager-condor"Requirements=(CurMatches<20)& (TARGET.JobUniverse == 9)WantAdRevaluate = TrueUpdateSequenceNumber = 1097580300CurMatches = 0OpSys = "LINUX“Arch = "INTEL"Memory = 501MPI = FalseINTEL_COMPILER=TrueGCC3=True
Tuning Condor to act as a metascheduler
• The default configuration of Condor is as a cycle scavenger
• Alter this through ensuring that all available tasks are attempted to be matched with each pass of the Negotiator
• Since we are a Condor-G system only we change the default universe of the system to grid
Changes to Condor configuration
DEFAULT_UNIVERSE = GLOBUSCLASSAD_LIFETIME = 900 NEGOTIATE_ALL_JOBS_IN_CLUSTER = TrueNEGOTIATOR_INTERVAL = 30 JOB_START_DELAY = 10GRIDMANAGER_JOB_PROBE_INTERVAL=60
Job Submission
• Most users are comfortable with command-line applications– Condor submission scripts would be another
language for our users to learn…– submission step as a scriptable application
with argument• Created job-submission
job-submission-h <HOSTNAME>/<JOBMANAGER>-e <EXECUTABLE>-t Boolean transfer exe?-a EXE arguments-i Input files to be transferred-o Output files to be transferred
Job classadexecutable = update_fileTransfer_Executable = Trueglobusscheduler = $$(gatekeeper_url)Requirements = (TARGET.gatekeeper_url == "t2ce02.physics.ox.ac.uk/jobmanager-lcgpbs" || TARGET.gatekeeper_url ==
"condor.oucs.ox.ac.uk/jobmanager-condor" || TARGET.gatekeeper_url == "grid-compute.oesc.ox.ac.uk/jobmanager-pbsox" || TARGET.gatekeeper_url == "bedrock.oucs.ox.ac.uk/jobmanager-sge") && TARGET.gatekeeper_url =!= UNDEFINED && TARGET.OpSys == "LINUX"
match_list_length = 1arguments = TEST_3_2.in TEST_3_2.outtransfer_input_files = TEST_3_2.intransfer_output_files = TEST_3_2.outWhenToTransferOutput = ON_EXITuniverse = gridgrid_type = gt2notification = ERRORoutput = temp-1168783341-2.outerror = temp-1168783341-2.errlog = temp-1168783341-2.logqueue
Additional User Tools
• oxgrid_certificate_import– Simplifies the installation of a user digital certificate to a single
command• oxgrid_q
– Display the users current queue at the resource broker. Has the options to allow the user to see the full task queue.
• oxgrid_status– Displays the resources that are available to the user with options
for all resource currently registering with the resource broker• oxgrid_cleanup
– Removes either a single submitted process or a range of child processes with their master
oxgrid_status
Users• Statistics• Materials science• Inorganic chemistry• Theoretical chemistry• Biochemistry• Computational biology• Astrophysics• Condensed matter physics• Zoology
• Researchers and students
Future Developments
• As part of GridBS project development:– Additional direct submission into MS CCS
using GridSAM BLAH– Addition of new types of data sources
• EGEE• Grimoires
• Continue to improve packaging to ensure ease of installation and re-distribution
Conclusion
• We have designed a resource broker that is orders of magnitude small with minimal external dependencies
• Simple tools have allowed users of OxGrid easy access to resources in many different institutions
• Over 65k individual tasks have been submitted to connected resources since January