oxford interdisciplinary e-research centre i e r c oxgrid, a campus grid for the university of...

45
Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Upload: karen-turner

Post on 30-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

OxGrid, A Campus Grid for the University of Oxford

Dr. David WallomCampus Grid Manager

Page 2: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Outline

• What is a grid?• Why make a campus grid?• How we are making it?

– Central Systems– Software– Resources– Users

• How can the ICT/ECE help this activity?

Page 3: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

What makes a Grid a Grid?

• Single sign-on to multiple resources located in different administrative domains.

• A Virtual Organisation of users that spans physical organisational boundaries.

Page 4: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

The Problem

• Many new problems in research have a need for massive computational and data access

• Research work increasingly limited by the capacity of accessible resources.

Page 5: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

The Solution

• If the computational or data need is too large for a single existing resource, construct a system able to concurrently use a number of appropriate resources.– Designed so that;

• use single sign-on to access multiple resources and switch between each seamlessly

• layout can be dynamically altered without user interruption• once a job has been started or data placed on a remote

resource, its status is monitored to make sure it stays running/available!

Page 6: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Why make a campus grid?

• Many computers throughout the University under-utilised:– PCs, already purchased – depreciating daily

• Idle time and unused disk space are being wasted.• e.g. OULS has up to 1200 desktop computers.

– Clusters are expensive to purchase, house and run (extra FTEs).

• Rarely 100% utilised • Users forced to queue to find suitable resources for their

research.

Page 7: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Why make a campus grid?

• Develop and deploy Grid technology to use under-utilised resources:– Higher utilisation

• Connect them together so that more often than not a free resource is available, minimising queue time.

– Amplify system administrator effort.– Substantially increase the research computing power

available• Ensure that should applications reach a suitable

resource ASAP, certainly quicker than in a single cluster

Page 8: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

OxGrid, a University Campus Grid

• Single entry point for Oxford users to shared and dedicated resources

• Seamless access to National Grid Service and OSC for registered users

• Single sign-on using PKI technology integrated with current methods

NGSOSC

OxGrid Central

Management

ResourceBroker

MDS/VOM Storage

College Resources

Departmental Resources

OxfordUsers

Page 9: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Authorisation And Authentication

• Initially use the standard UK e-Science Certification Authority– X509 digital certificates issued on a per user basis.– OUCS is a Registration Authority for this CA

• For users that only wish to access internal (university) resources, a Kerberos CA has been installed, controlled by the Oxford central Kerberos system (Herald username)

• Use an online credential repository to minimise user - certificate interaction

Page 10: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Central System Components

• Information Service– Contains all system status information on which the resource broker

makes decisions, retrieving information from all clients in the system• Resource Broker

– User access and distribution of submitted tasks to appropriate resources

• Systems monitoring– Monitoring system for helpdesk first point of system contact in case of

problems• Virtual Organisation Management and Resource Usage Service

– Control a virtual community whose members can use various resources– Create accounting information so that full system as well as single

resource use can be recorded and hence possibly charged for• Storage

– Create a dynamic multi-homed virtual file system– User metadata mark-up for improved data mining

Page 11: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Grid Middleware

• Virtual Data Toolkit– Chosen for stability & support structure– Platform independent installation method– Widely used in other European production

grid systems– Contains

• Globus Toolkit™ version 2.4 with several enhancements

• GSI enhanced OpenSSH• myProxy Client & Server

Page 12: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Information Server

• Globus Grid Resource Information Index• Central LDAP database for system

information• System information, CPU, memory etc.• Scheduler queue status, number of

running & queued tasks• Further additions to published data easily

managed• Pull model for retrieving data from clients

Page 13: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Resource Broker

• Uses the Condor-G™ meta-scheduler– Can be considered a large batch processing system– Condor-G allows treatment of a remote resource (cluster, PC pool) as a

local resource– Command-line tools available to perform job management (submit,

query, cancel, etc.) with detailed logging– Simple job submission language which is translated into remote

scheduler specific language

• Custom script for determination of resource status & priority.

• Integrated the Condor Resource description mechanism and Globus Monitoring and Discovery Service.

Page 14: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

OxGrid specific information added

• Priority of resource dependant on current load measured against possible load

• List of installed software on each node• Resource usage permissions (registered

users of NGS, OSC)

Page 15: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Job to Resource Matching

• For each resource that is accessible to the Resource Broker a machine advertisement is created.– Contains information such as CPU type, available memory and

any additional information such as load etc.

• For each job that is submitted to the Resource Broker a job advertisement is created.– This has the job requirements, such as CPU type, memory

necessary etc.

• Specific daemon within the system does matchmaking between the job requirements and the resource properties.

Page 16: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Resource Broker Operation

Page 17: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Virtual Organisation Management

• Globus uses a mapping between Distinguished Name (DN) as defined in a Digital Certificate to local usernames on each resource.

• Important that for each resource that a user is expecting to use, his DN is mapped locally.

• Have to also make sure the correct resources are registered.

Page 18: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Virtual Organisation Management and Accounting

• OxVOM– Custom in-house designed Web based user interface– Persistent information stored in relational database– User DN list retrieved by remote resources using

standard tools• Resource Usage Service

– Installed software altered to include commands to determine job start and stop time as well as interface with host scheduling system

– Using Global Grid Forum User Record Usage Service standard

– Information returned from client to RUS server when job completed and stored in persistent database

Page 19: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

OxGrid VOM

Page 20: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Resource Usage Service

• Enables presentation of system use to users as well as system owners

• Can form the basis of a charging model

Page 21: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Systems Monitoring

• ‘Ganglia’ monitoring tool for system status and graphical representation

• Simple interface showing immediate hardware problems as well as system load

• Well understood by helpdesk and support staff

• Open source with simple configuration

Page 22: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Ganglia System Monitoring

Page 23: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Core Resources

• Individual Departmental Clusters (PBS, LSF, SGE)– Grid software interfaces– Management of users– Owner controlled access through local

management software• Condor clusters of PCs

– Single master running up to ~500 nodes– Condor masters run either by owners or IeRC

Page 24: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

External Resources

• Only accessible to users that have registered with them– National Grid Service

• Peered access with individual systems– OSC

• Gatekeeper system• User management done through standard account

issuing procedures and manual DN mapping• Controlled grid submission to Oxford

Supercomputing Centre

Page 25: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Services necessary to connect to OxGrid

• For a system to connect to OxGrid– Must support a minimum software set (without which

it is impossible to submit jobs from the Resource Broker)

• Globus 2.4 job management and RUS compatible jobmanager

• MDS compatible information server

– Desirable though not mandated• OxVOM compatible grid-mapfile installation scripts

• With a scheduling system installed the system administrator is in control

Page 26: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Connecting Clusters into OxGrid, 1

• Direct connection– Install middleware etc. onto system head nodes

• Automated installation script• Well known procedure

– Known port numbers for services and port range for data transfer

– Addition of ~30 user pool accounts

• Example of this type of setup is Oxford NGS node– Contact Steven Young (OeSC)

Page 27: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Connecting Clusters into OxGrid, 2

• Indirect– Separate gatekeeper system with submission

components of local scheduler• Transfer Queues on each gatekeeper• Decouples Globus from local resources

– Hides internals from the Grid users– Many clusters can be handled by one system

jobmanager• Example of this type of installation is the old

OSC Gatekeeper.– Contact Jon Lockley (OSC)

Page 28: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Connecting PCs, 1

• Student labs, libraries and college terminal rooms

• Very different usage patterns for this type of resource– Systems inaccessible out of hours, greatest

performance from dual boot using Windows/Scientific Linux

• Can have environmental and power considerations

– 24 hour access, coLinux virtual machine installation running in parallel with native OS

• Both of these types of systems use Condor and a Linux condor master server.

Page 29: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Connecting PCs, 2

• Install Windows Condor client– Runs a system service

• Configured either to hold when local user or• to run at all times with low priority

– Studies by several groups have shown that for modern systems a student user sees no system performance difference between the two

– Downside• there is a significant extra effort needed because of code

recompiling and porting.• Some code will not run because of external libraries availability

– ‘Services for Unix’ being investigated to run linux jobs natively on Windows systems.

Page 30: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Environmentally aware Condor systems

• Increasingly system owners shutdown machines that are not being used.– Save electricity

• Develop a scheme to still use these systems within OxGrid– Take advantage of Wake-On-LAN technology.– Automate load balancing to start and stop

worker nodes as necessary.

Page 31: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Connecting Others

• Sun– Create Sun Grid Engine clusters and then

perform direct connection method

• Mac– Apple have their own grid software Xgrid

• Not fully tested

– Supported by Condor

Page 32: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Data Management

• Engagement of data as well as computationally intensive research groups

• Provide a remote store for those groups that cannot resource their own

• Distribute the client software as widely as possible, including departments that are not currently engaged in e-Research

Page 33: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Data Management

• Software for creation of system– Storage Resource Broker to create large

virtual datastore• Through central metadata catalogue users

interface with single virtual file system though physical volumes may be on several network resources

• In built metadata capability

Page 34: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

SRB Architecture

MCAT

Disk Server1 Disk Server2

Mcat Server

USER

Page 35: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

SRB as a Data Grid

SRB

MCAT

DB

SRB

SRB

SRB

SRB SRB

•Data Grid has arbitrary number of servers•Complexity is hidden from users

Page 36: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

SRB Client Implementations

• inQ – Window GUI browser• Jargon – Java SRB client classes

– Pure Java implementation

• mySRB – Web based GUI– run using web browser

• Java Admin Tool– GUI for User and Resource management

• Matrix – Web service for SRB work flow

Page 37: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

How users interact with OxGrid

• Log in to system head node (Resource Broker)

• Create digital credential • Use ‘job-submission’ script to create and

submit jobs onto Condor-G system.

Page 38: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Supporting OxGrid

• First point of contact is OUCS Helpdesk through support email.– Preset list of questions to ask and log files to see if available.– Not expected to do any actual debugging.– Pass problems onto Grid experts who

• pass hardware problems on a system by system basis to their own maintenance staff.

• Answer grid software problems themselves.

• Significant cluster support expertise within OeSC/IeRC.

• As one of the UK e-Science Centres we also have access to the Grid Support Centre.

Page 39: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Users

• Installed several example applications– Plasma physics– Polymer physics– Biochemistry protein docking– Graphics rendering

• We have our first Oxford user code example– Dr Peter Grout, Chemistry

• Contacting currently registered users of both OSC as well as NGS.– Beneficial to these systems to remove ‘serial’ users that don’t need to

be there to provide more capability to those that must be there.

• Data provision is an integral component of the grid– Contacting Humanities and other large data users

Page 40: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Collaboration

• Configuring computational components to share resources between Harvard & Monash Universities as proof of principle of global campus grids.

• Configuring Storage System to allow safe, secure multi-site storage of data with Monash.

Page 41: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

How the ICT Strategy & ECE can help

• Produce single uniform configuration of ~2000 systems.• Willingness at the design outset to include the capacity

to use systems for computation and hence include as a key criteria in final system choice.

• Consider using a supported architecture that is popular with computationally active researchers.

• Use an underlying system management software that is flexible enough to allow for usage changes of resources, e.g. Alteris.

• Persuade that efficient usage of resources and sharing is within everyone's best interests.

Page 42: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

The Future

• Improve RB system usage algorithm• Install Service based grid software on test

system to provide transition information• Package central server modules for public

distribution

Page 43: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

The Future, 2

• Develop Windows/Linux Condor pools so that all shared systems can be included

• Continue contacting users to expand the user base

• Design and construct user training courses.

Page 44: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Conclusions

• Users are already able to log onto the Resource Broker and schedule work onto the NGS, OSC and OUCS Condor Systems

• We are working as quickly as possible to engage more users

• We need these users to then go out and evangelise to bring in both more users and resource.

Page 45: Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research CentreI e R C

Contact

• Email: [email protected]• Telephone: 01865 283378