tool integration with data and computation grid gwe - “grid wizard enterprise”
TRANSCRIPT
AGENDA
1. Computing Needs Overview.
2. Grid Architecture.
3. GWE - “Grid Wizard Enterprise”
4. Demo + Q&A
COMPUTING NEEDS OVERVIEW
1. Requirements for computing increasing fast. Main reasons:
• More data to process.
• More compute intensive algorithms available.
2. Approaches to supply demand:
• Qualitative: Optimized algorithms, faster processors, more memory.
• Quantitative: Grid Computing (parallel, distributed, etc).
GRID ARCHITECTURE
Clusters & Resource Managers
1. Cluster Operating System: Operating system managing all the hardware components in a cluster. Example: Rocks.
2. Network: Communication backbone among all hardware components in a cluster.
3. Head Node: Dedicated system hosting cluster level services.
4. Compute Nodes: Computers and processors.
5. File System: Networked file system accessible by any node in the cluster (independent from other file system nodes may have access to).
6. Resource Manager: Optional software component. It serves as a front end to queue and manage cluster “client”s jobs. Example: Condor, SGE, PBS, Torque, LSF, etc.
Cluster Operating System
Network
Head Node Compute Nodes File System
Resource Manager
GRID ARCHITECTURE
The Grid
• Collection of Clusters.
• To use a it, the Grid “Client” is responsible to provide the logic to integrate the clusters within the logic of their specific applications.
• Most of the times this effort is non-trivial, non-reusable, non-extensible, lacks robustness and is far from giving the end user all the desired functionality.
GRID
...
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Grid AND/OR Cluster Client
GRID ARCHITECTURE
“Meta Scheduler” enabled Grid - 1• An approach to overcome the daunting task of integrating clusters under a single view.
• Meta schedulers provide a “Resource Manager” translation layer to access clusters.
• Provides a unified language for accessing the clusters to the Grid “Client”.
Meta Scheduler Enabled Grid
Meta Scheduler
...
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Grid AND/OR Cluster Client Cluster Client
GRID ARCHITECTURE
“Meta Scheduler” enabled Grid - 2• Advantages: Simple and powerful solution.
• Disadv.: No granular control, depends on resource manager to execute anything on the cluster.
• Samples: Globus “GridWay”, “Community Scheduler Framework”, “Grid Wizard”
Meta Scheduler Enabled Grid
Meta Scheduler
...
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Grid AND/OR Cluster Client Cluster Client
GWE - “GRID WIZARD ENTERPRISE”
Introduction
• Why? Because you do not care about the grid and you just want to run your jobs faster!
• Complete rework of “Grid Wizard” meta scheduler.
• It is more a “Grid Virtualization Platform”.
• Java based distributed system.
• Collection of autonomous back end applications running in clusters head and compute nodes.
• Extremely easy integration, installation and usage.
• Provides single view to clusters and/or grid.
• Provides lots of granular control and features to grid “client”s.
• Provides auto configuration with most sensible and auto discovered values to minimize user input.
• Requirements: SSH enabled clusters, Java 1.5.
GWE - “GRID WIZARD ENTERPRISE”
Internal Components Interfaces• Two types:
a) To other GWE components.
b) To cluster specific components. These are built on top of a pluggable framework which allows third party providers to easily add support to other systems through customized drivers.
From otherGWE
Daemons
To otherGWE
Daemons
From GWE ClientsTo GWE Monitors
To ResourceManager
To/FromGWE Agents
To FileSystem
ToNetwork
“Grid Wizard Enterprise” Cluster Components
... ...
GWE - “GRID WIZARD ENTERPRISE”
“Grid Wizard Enterprise” enabled Cluster• Tight integration with cluster
internals and transparent access to them.
• Granular level of execution control: submission, pause, resume, abort.
• Real time monitoring and alerting capabilities.
• Granular reporting of historic, diagnostics and statistics data.
• Transparent environment translation for requests.
• Programmatic control with rich and simple API.
• Workflow capabilities (future).
Cluster Operating System
Network
Head Node Compute Nodes File System
Resource Manager
“Grid Wizard Enterprise” Cluster Components
... ...
GWE - “GRID WIZARD ENTERPRISE”
Distributed System - 1
• “Grid Wizard Enterprise” Cluster Components are designed to interface with other “Grid Wizard Enterprise” Cluster Components.
• This feature allows them to form a distributed network of one and/or many clusters “on the fly”.
• Grid ‘clients’ can form virtual views of these distributed network “on the fly” as they wish and have access to.
• Provides the features of a single GWE enabled cluster to “N” GWE enabled clusters by chaining them in a ring configuration which is:
• Dynamically updated in real time.
• Transparently self-managed.
• Customizable per user and per request.
GWE - “GRID WIZARD ENTERPRISE”
Distributed System - 2
GRID
...
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
“Grid Wizard Enterprise” Distributed System
GWE Components
... ...
GWE Components
... ...
GWE Components
... ...
...
GWE - “GRID WIZARD ENTERPRISE”
“Grid Wizard Enterprise” enabled Grid• Grid “Client” has to establish a link to any “Grid Wizard Enterprise Components” sitting on a cluster
to get a unified view of a grid and gain access to all the services “Grid Wizard Enterprise” provides over such grid.
“Grid Wizard Enterprise” Enabled Grid
...
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
Cluster Operating System
Network
HeadNode
ComputeNodes
FileSystem
Resource Manager
GWE Components
... ...
GWE Components
... ...
GWE Components
... ...
...
Grid AND/OR ClusterUser
Cluster UserGrid AND/OR Cluster
Monitor
GWE - “GRID WIZARD ENTERPRISE”
Architecture Overview
EmbeddedDatabase
Driver Manager Framework
FileSystemDrivers
NetworkProtocolDrivers
ResourceManager Drivers
Internal Processes
DAO + ORM Layer
External Processes
Persistent Model
RMI
Secured RCP Backbone
Serverfor
Agent
“Grid Wizard Enterprise" Cluster Components
GWE - “GRID WIZARD ENTERPRISE”
Architecture Overview
EmbeddedDatabase
Driver Manager Framework
FileSystemDrivers
NetworkProtocolDrivers
ResourceManager Drivers
Internal Processes
DAO + ORM Layer
External Processes
Persistent Model
RMI
Secured RCP BackboneServer for Daemon Client for Daemon
Serverfor
Client
Serverfor
Monitor
Serverfor
Agent
GWE - “GRID WIZARD ENTERPRISE”
Tool Integration• Out of the box, generic, command line integration using xml configuration files and VTL (to describe
jobs): http://velocity.apache.org/engine/devel/vtl-reference-guide.html
• VTL Job Descriptor Sample:
#set($path = "srbfile:/home/mruiz.ucsd-bcc/2d")
#foreach( $count in ["01", "02", "03", "04", "05", "06", "07"]) /opt/BIRN/lddmm/1.0.1/lddmm -A ${path}/Atlas.img -T ${path}/Patient${count}.img -d 2
#end
• Implementation of highly specialized integrations through rich java APIs:
• Client API: Provides interface for user to control requests execution.
• Monitor API: Provides interface to register interest in events and gather them in real time from distributed system.
GWE - “GRID WIZARD ENTERPRISE”
Drivers Integration• Plug-in bundles contain:
• Java binaries.
• Plug-in xml descriptors.
• Drivers categories (so far):
• Network Protocols. Samples: Local, SSH
• File Systems. Samples: Local, SFTP, SRB
• Resource Managers. Samples: Condor, SGE, PBS, Torque, LSF
GWE - “GRID WIZARD ENTERPRISE”
Schedule• December 2007
• Pre-release. Alpha version of complete infrastructure including automatic deployment, limited drivers (including limited SRB drivers) and limited monitoring.
• Tool Integration Migration: BIRN Portal LDDMM.
• February 2008
• Pre-release. Beta version.
• Tool Integration: Slicer 3 (embedded and generic).
• April 2008
• Production Release. Stable version with drivers for most popular cluster components (including full transparent SRB support), full monitoring, alert capabilities and API bundles.
• Tool Integration: BIRN Portal (embedded and generic).
• June 2008
• Tool Integration: FreeSurfer and FIPS (specialized and full featured).