oxford interdisciplinary e-research centre i e r c oxgrid, a campus grid for the university of...
TRANSCRIPT
Oxford Interdisciplinary e-Research CentreI e R C
OxGrid, A Campus Grid for the University of Oxford
Dr. David WallomCampus Grid Manager
Oxford Interdisciplinary e-Research CentreI e R C
Outline
• What is a grid?• Why make a campus grid?• How we are making it?
– Central Systems– Software– Resources– Users
• How can the ICT/ECE help this activity?
Oxford Interdisciplinary e-Research CentreI e R C
What makes a Grid a Grid?
• Single sign-on to multiple resources located in different administrative domains.
• A Virtual Organisation of users that spans physical organisational boundaries.
Oxford Interdisciplinary e-Research CentreI e R C
The Problem
• Many new problems in research have a need for massive computational and data access
• Research work increasingly limited by the capacity of accessible resources.
Oxford Interdisciplinary e-Research CentreI e R C
The Solution
• If the computational or data need is too large for a single existing resource, construct a system able to concurrently use a number of appropriate resources.– Designed so that;
• use single sign-on to access multiple resources and switch between each seamlessly
• layout can be dynamically altered without user interruption• once a job has been started or data placed on a remote
resource, its status is monitored to make sure it stays running/available!
Oxford Interdisciplinary e-Research CentreI e R C
Why make a campus grid?
• Many computers throughout the University under-utilised:– PCs, already purchased – depreciating daily
• Idle time and unused disk space are being wasted.• e.g. OULS has up to 1200 desktop computers.
– Clusters are expensive to purchase, house and run (extra FTEs).
• Rarely 100% utilised • Users forced to queue to find suitable resources for their
research.
Oxford Interdisciplinary e-Research CentreI e R C
Why make a campus grid?
• Develop and deploy Grid technology to use under-utilised resources:– Higher utilisation
• Connect them together so that more often than not a free resource is available, minimising queue time.
– Amplify system administrator effort.– Substantially increase the research computing power
available• Ensure that should applications reach a suitable
resource ASAP, certainly quicker than in a single cluster
Oxford Interdisciplinary e-Research CentreI e R C
OxGrid, a University Campus Grid
• Single entry point for Oxford users to shared and dedicated resources
• Seamless access to National Grid Service and OSC for registered users
• Single sign-on using PKI technology integrated with current methods
NGSOSC
OxGrid Central
Management
ResourceBroker
MDS/VOM Storage
College Resources
Departmental Resources
OxfordUsers
Oxford Interdisciplinary e-Research CentreI e R C
Authorisation And Authentication
• Initially use the standard UK e-Science Certification Authority– X509 digital certificates issued on a per user basis.– OUCS is a Registration Authority for this CA
• For users that only wish to access internal (university) resources, a Kerberos CA has been installed, controlled by the Oxford central Kerberos system (Herald username)
• Use an online credential repository to minimise user - certificate interaction
Oxford Interdisciplinary e-Research CentreI e R C
Central System Components
• Information Service– Contains all system status information on which the resource broker
makes decisions, retrieving information from all clients in the system• Resource Broker
– User access and distribution of submitted tasks to appropriate resources
• Systems monitoring– Monitoring system for helpdesk first point of system contact in case of
problems• Virtual Organisation Management and Resource Usage Service
– Control a virtual community whose members can use various resources– Create accounting information so that full system as well as single
resource use can be recorded and hence possibly charged for• Storage
– Create a dynamic multi-homed virtual file system– User metadata mark-up for improved data mining
Oxford Interdisciplinary e-Research CentreI e R C
Grid Middleware
• Virtual Data Toolkit– Chosen for stability & support structure– Platform independent installation method– Widely used in other European production
grid systems– Contains
• Globus Toolkit™ version 2.4 with several enhancements
• GSI enhanced OpenSSH• myProxy Client & Server
Oxford Interdisciplinary e-Research CentreI e R C
Information Server
• Globus Grid Resource Information Index• Central LDAP database for system
information• System information, CPU, memory etc.• Scheduler queue status, number of
running & queued tasks• Further additions to published data easily
managed• Pull model for retrieving data from clients
Oxford Interdisciplinary e-Research CentreI e R C
Resource Broker
• Uses the Condor-G™ meta-scheduler– Can be considered a large batch processing system– Condor-G allows treatment of a remote resource (cluster, PC pool) as a
local resource– Command-line tools available to perform job management (submit,
query, cancel, etc.) with detailed logging– Simple job submission language which is translated into remote
scheduler specific language
• Custom script for determination of resource status & priority.
• Integrated the Condor Resource description mechanism and Globus Monitoring and Discovery Service.
Oxford Interdisciplinary e-Research CentreI e R C
OxGrid specific information added
• Priority of resource dependant on current load measured against possible load
• List of installed software on each node• Resource usage permissions (registered
users of NGS, OSC)
Oxford Interdisciplinary e-Research CentreI e R C
Job to Resource Matching
• For each resource that is accessible to the Resource Broker a machine advertisement is created.– Contains information such as CPU type, available memory and
any additional information such as load etc.
• For each job that is submitted to the Resource Broker a job advertisement is created.– This has the job requirements, such as CPU type, memory
necessary etc.
• Specific daemon within the system does matchmaking between the job requirements and the resource properties.
Oxford Interdisciplinary e-Research CentreI e R C
Resource Broker Operation
Oxford Interdisciplinary e-Research CentreI e R C
Virtual Organisation Management
• Globus uses a mapping between Distinguished Name (DN) as defined in a Digital Certificate to local usernames on each resource.
• Important that for each resource that a user is expecting to use, his DN is mapped locally.
• Have to also make sure the correct resources are registered.
Oxford Interdisciplinary e-Research CentreI e R C
Virtual Organisation Management and Accounting
• OxVOM– Custom in-house designed Web based user interface– Persistent information stored in relational database– User DN list retrieved by remote resources using
standard tools• Resource Usage Service
– Installed software altered to include commands to determine job start and stop time as well as interface with host scheduling system
– Using Global Grid Forum User Record Usage Service standard
– Information returned from client to RUS server when job completed and stored in persistent database
Oxford Interdisciplinary e-Research CentreI e R C
OxGrid VOM
Oxford Interdisciplinary e-Research CentreI e R C
Resource Usage Service
• Enables presentation of system use to users as well as system owners
• Can form the basis of a charging model
Oxford Interdisciplinary e-Research CentreI e R C
Systems Monitoring
• ‘Ganglia’ monitoring tool for system status and graphical representation
• Simple interface showing immediate hardware problems as well as system load
• Well understood by helpdesk and support staff
• Open source with simple configuration
Oxford Interdisciplinary e-Research CentreI e R C
Ganglia System Monitoring
Oxford Interdisciplinary e-Research CentreI e R C
Core Resources
• Individual Departmental Clusters (PBS, LSF, SGE)– Grid software interfaces– Management of users– Owner controlled access through local
management software• Condor clusters of PCs
– Single master running up to ~500 nodes– Condor masters run either by owners or IeRC
Oxford Interdisciplinary e-Research CentreI e R C
External Resources
• Only accessible to users that have registered with them– National Grid Service
• Peered access with individual systems– OSC
• Gatekeeper system• User management done through standard account
issuing procedures and manual DN mapping• Controlled grid submission to Oxford
Supercomputing Centre
Oxford Interdisciplinary e-Research CentreI e R C
Services necessary to connect to OxGrid
• For a system to connect to OxGrid– Must support a minimum software set (without which
it is impossible to submit jobs from the Resource Broker)
• Globus 2.4 job management and RUS compatible jobmanager
• MDS compatible information server
– Desirable though not mandated• OxVOM compatible grid-mapfile installation scripts
• With a scheduling system installed the system administrator is in control
Oxford Interdisciplinary e-Research CentreI e R C
Connecting Clusters into OxGrid, 1
• Direct connection– Install middleware etc. onto system head nodes
• Automated installation script• Well known procedure
– Known port numbers for services and port range for data transfer
– Addition of ~30 user pool accounts
• Example of this type of setup is Oxford NGS node– Contact Steven Young (OeSC)
Oxford Interdisciplinary e-Research CentreI e R C
Connecting Clusters into OxGrid, 2
• Indirect– Separate gatekeeper system with submission
components of local scheduler• Transfer Queues on each gatekeeper• Decouples Globus from local resources
– Hides internals from the Grid users– Many clusters can be handled by one system
jobmanager• Example of this type of installation is the old
OSC Gatekeeper.– Contact Jon Lockley (OSC)
Oxford Interdisciplinary e-Research CentreI e R C
Connecting PCs, 1
• Student labs, libraries and college terminal rooms
• Very different usage patterns for this type of resource– Systems inaccessible out of hours, greatest
performance from dual boot using Windows/Scientific Linux
• Can have environmental and power considerations
– 24 hour access, coLinux virtual machine installation running in parallel with native OS
• Both of these types of systems use Condor and a Linux condor master server.
Oxford Interdisciplinary e-Research CentreI e R C
Connecting PCs, 2
• Install Windows Condor client– Runs a system service
• Configured either to hold when local user or• to run at all times with low priority
– Studies by several groups have shown that for modern systems a student user sees no system performance difference between the two
– Downside• there is a significant extra effort needed because of code
recompiling and porting.• Some code will not run because of external libraries availability
– ‘Services for Unix’ being investigated to run linux jobs natively on Windows systems.
Oxford Interdisciplinary e-Research CentreI e R C
Environmentally aware Condor systems
• Increasingly system owners shutdown machines that are not being used.– Save electricity
• Develop a scheme to still use these systems within OxGrid– Take advantage of Wake-On-LAN technology.– Automate load balancing to start and stop
worker nodes as necessary.
Oxford Interdisciplinary e-Research CentreI e R C
Connecting Others
• Sun– Create Sun Grid Engine clusters and then
perform direct connection method
• Mac– Apple have their own grid software Xgrid
• Not fully tested
– Supported by Condor
Oxford Interdisciplinary e-Research CentreI e R C
Data Management
• Engagement of data as well as computationally intensive research groups
• Provide a remote store for those groups that cannot resource their own
• Distribute the client software as widely as possible, including departments that are not currently engaged in e-Research
Oxford Interdisciplinary e-Research CentreI e R C
Data Management
• Software for creation of system– Storage Resource Broker to create large
virtual datastore• Through central metadata catalogue users
interface with single virtual file system though physical volumes may be on several network resources
• In built metadata capability
Oxford Interdisciplinary e-Research CentreI e R C
SRB Architecture
MCAT
Disk Server1 Disk Server2
Mcat Server
USER
Oxford Interdisciplinary e-Research CentreI e R C
SRB as a Data Grid
SRB
MCAT
DB
SRB
SRB
SRB
SRB SRB
•Data Grid has arbitrary number of servers•Complexity is hidden from users
Oxford Interdisciplinary e-Research CentreI e R C
SRB Client Implementations
• inQ – Window GUI browser• Jargon – Java SRB client classes
– Pure Java implementation
• mySRB – Web based GUI– run using web browser
• Java Admin Tool– GUI for User and Resource management
• Matrix – Web service for SRB work flow
Oxford Interdisciplinary e-Research CentreI e R C
How users interact with OxGrid
• Log in to system head node (Resource Broker)
• Create digital credential • Use ‘job-submission’ script to create and
submit jobs onto Condor-G system.
Oxford Interdisciplinary e-Research CentreI e R C
Supporting OxGrid
• First point of contact is OUCS Helpdesk through support email.– Preset list of questions to ask and log files to see if available.– Not expected to do any actual debugging.– Pass problems onto Grid experts who
• pass hardware problems on a system by system basis to their own maintenance staff.
• Answer grid software problems themselves.
• Significant cluster support expertise within OeSC/IeRC.
• As one of the UK e-Science Centres we also have access to the Grid Support Centre.
Oxford Interdisciplinary e-Research CentreI e R C
Users
• Installed several example applications– Plasma physics– Polymer physics– Biochemistry protein docking– Graphics rendering
• We have our first Oxford user code example– Dr Peter Grout, Chemistry
• Contacting currently registered users of both OSC as well as NGS.– Beneficial to these systems to remove ‘serial’ users that don’t need to
be there to provide more capability to those that must be there.
• Data provision is an integral component of the grid– Contacting Humanities and other large data users
Oxford Interdisciplinary e-Research CentreI e R C
Collaboration
• Configuring computational components to share resources between Harvard & Monash Universities as proof of principle of global campus grids.
• Configuring Storage System to allow safe, secure multi-site storage of data with Monash.
Oxford Interdisciplinary e-Research CentreI e R C
How the ICT Strategy & ECE can help
• Produce single uniform configuration of ~2000 systems.• Willingness at the design outset to include the capacity
to use systems for computation and hence include as a key criteria in final system choice.
• Consider using a supported architecture that is popular with computationally active researchers.
• Use an underlying system management software that is flexible enough to allow for usage changes of resources, e.g. Alteris.
• Persuade that efficient usage of resources and sharing is within everyone's best interests.
Oxford Interdisciplinary e-Research CentreI e R C
The Future
• Improve RB system usage algorithm• Install Service based grid software on test
system to provide transition information• Package central server modules for public
distribution
Oxford Interdisciplinary e-Research CentreI e R C
The Future, 2
• Develop Windows/Linux Condor pools so that all shared systems can be included
• Continue contacting users to expand the user base
• Design and construct user training courses.
Oxford Interdisciplinary e-Research CentreI e R C
Conclusions
• Users are already able to log onto the Resource Broker and schedule work onto the NGS, OSC and OUCS Condor Systems
• We are working as quickly as possible to engage more users
• We need these users to then go out and evangelise to bring in both more users and resource.
Oxford Interdisciplinary e-Research CentreI e R C
Contact
• Email: [email protected]• Telephone: 01865 283378