gilbert thomas grid computing & sun grid engine “basic concepts” [email protected]
TRANSCRIPT
Agenda
● Introduction● Grid Computing● Sun Grid Engine (SGE)
Problem:
Not using Scientists/Engineers efficiently
Solution:
A Grid makes it easy for the engineers to submit jobs. They run more tests— product design cycle improves.
Benefits:
Increase productivity which leads to shorter time to market, higher quality and lower costs
The Productivity Challenge
Grid ComputingA New Computing Utility Model
Problem-solving through resource pooling in virtual systems:
Virtualization of…
Transparent scalability of…
Access that is...
Resources into a dynamic, single compute resource
CPU cycles, storage
Dependable, consistent, pervasive, inexpensive
Stages of Sun Grid Computing
Cluster GridDepartmental Computing
• Simplest Grid deployment• Maximum utilization of
departmental resources• Resources allocated based on
priorities
Campus GridEnterprise Computing
• Resources shared within the enterprise
• Policies ensure computing on demand
• Gives multiple groups seamless access to enterprise resources
Global GridInternet Computing
• Resources shared over the Internet• Global view of distributed datasets• Growth path for enterprise Campus
Grids
Grid Computing ModelCluster Grids
Usage• Simplest Grid deployment• Single team: Project Department• Single site firewall
Benefit• Optimal
alignment of resources, tasks, and budgets
Industry Examples• Automotive—More simulations
for safer cars• Entertainment—Faster image-
frame rendering• Life Sciences—Pattern matching
against huge datasets• EDA—Increased design
iterations create more powerful devices
Grid Computing ModelCampus Grids
Usage• Multiple teams in organization share one or more Cluster Grids
• Single site to enterprise-wide
Benefit• Maximum ROI and
utility
Industry Examples• Manufacturing—Collaborative
engineering projects• Oil and Gas—Mining-
distributed databases• Finance—More Monte Carlo
simulations for uncovering new business
Grid Computing ModelGlobal Grids
N1Usage• Linked Cluster and CampusGrid Models across manyorganizations
• Typically used for research
Benefit• Creates large virtual
system• Facilitates
collaborationbetween organizations
Industry Examples• Medicine—Provides expert
teams access to medical instruments and distributed computing resources
• Academia—Facilitates collaboration between geographically dispersed groups
• Research—Enables compute-intensive projects beyond the firewall
Grid Computing Adoption Trends
Campus Grids• Multiple teams• Single organization
Global Grids• Multiple teams• Multiple organizations
Cluster Grids• Single team• Single organization
Key Software Technologies for the Grid
Cluster Grid:Sun Grid Engine
Campus Grid:Sun Grid Engine, Enterprise Edition
Global Grid:Globus, Avaki
= Sun Grid Computing software
How it WorksGrid Hardware and Software Components
•Resource management services above OS layer to integrate systems
•Hardware/OS systems are unchanged
•Minimal management software/tool costs
•Connecting people, departments,organizations, communities
Cluster Grid SolutionSun™ Grid Engine
• Maximize resources for single projects, teams, departments
• Prioritize jobs
• Manage jobs from start to finish
• Free download for Solaris and Linux Operating Environments
Sun Grid Engine Free DownloadsFirst Year
Fast becoming the most-used Distributed Resource
Manager (DRM) tool • 3016 unique sites
• 118,000 CPUs worldwide run Sun Grid Engine
• 1 new CPU every 5 minutes
• Over 90 countries
• 60% never used Grid software before
• 92% rated Sun Grid Engine as Good, Very Good, or Excellent
Existing Problem In Clusters
Bottleneck
Server1 Server2 Server3 ServerN…..
Laptop
Laptop
Computer
Workstation
Laptop
Laptop
Workstation Laptop
IdleOverloaded
• Load Balancing– Ensure no single compute resource is
overloaded– SGE automatically finds the resource
with the least load for every new job– If no free resource is found, the job is
queued till a free resource is available– Implication: Jobs run and finish faster!
Solution : Sun Grid Engine
• Job types - a mixture of:– Batch– Interactive (qsh, qrsh, qlogin)– Parallel (mpi, pvm ...)– Checkpointing – Array Jobs (unlimited size, massive
scalability)• Dynamically changeable while pending
(prior to execution)
Job Types
Monitoring
● Qmon
● Mail notification
● Qstat
Qmon: SGE’s GUI
Configuring Queues
Checking Queue Status
Submitting Jobs
Checking Job Status
Qstat• Display all info about queues
> qstat -f
• State column:- r = running - s = suspended- q = queued - w = waiting
Qmod• Control the status of the queues in your
cluster
- qmod –d Disable a queue- qmod –e Enable a queue- qmod –s Suspend a queue- qmod –us Resume a suspended queue- qmod –c Clear the error states of a queue - qstat –alarm Show the alarm state of a queue
Complexes
• Set host-specific attributes:- Number of slots- Maximum amount of memory that can be used- Maximum number of diskblocks that can be used- Maximum load for that host
• Set requestable values to a queue:- Software licences- Available memory- Available disk space- Specific data-sets
Parallel Environments
● Parallel Virtual Machine (PVM)
● Message Parsing Interface (MPI)
● A parallel environment allows execution of shared memory and distributed memory applications.
Parallel Environments
• Advantages of tight integration with SGE:
- Correct accounting- Full job control, i.e.: suspending tasks- Resource limits- Cleaning up/killing all tasks
References• Sun Grid Engine Home
• Sun Grid Engine Open Source
http://www.sun.com/gridware/
http://www.gridengine.sunsource.net
http://www.sun.com/gridware/support.html
• Sun Grid Engine Web-Based Training
Gilbert ThomasAssociate [email protected]
Thank you!Thank YouFor further enquiries ,Email [email protected]