tool integration with data and computation grid gwe - “grid wizard enterprise”

19
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

Upload: reynard-hunt

Post on 04-Jan-2016

233 views

Category:

Documents


2 download

TRANSCRIPT

Tool Integration with Data and

Computation GridGWE - “Grid Wizard Enterprise”

AGENDA

1. Computing Needs Overview.

2. Grid Architecture.

3. GWE - “Grid Wizard Enterprise”

4. Demo + Q&A

COMPUTING NEEDS OVERVIEW

1. Requirements for computing increasing fast. Main reasons:

• More data to process.

• More compute intensive algorithms available.

2. Approaches to supply demand:

• Qualitative: Optimized algorithms, faster processors, more memory.

• Quantitative: Grid Computing (parallel, distributed, etc).

GRID ARCHITECTURE

Clusters & Resource Managers

1. Cluster Operating System: Operating system managing all the hardware components in a cluster. Example: Rocks.

2. Network: Communication backbone among all hardware components in a cluster.

3. Head Node: Dedicated system hosting cluster level services.

4. Compute Nodes: Computers and processors.

5. File System: Networked file system accessible by any node in the cluster (independent from other file system nodes may have access to).

6. Resource Manager: Optional software component. It serves as a front end to queue and manage cluster “client”s jobs. Example: Condor, SGE, PBS, Torque, LSF, etc.

Cluster Operating System

Network

Head Node Compute Nodes File System

Resource Manager

GRID ARCHITECTURE

The Grid

• Collection of Clusters.

• To use a it, the Grid “Client” is responsible to provide the logic to integrate the clusters within the logic of their specific applications.

• Most of the times this effort is non-trivial, non-reusable, non-extensible, lacks robustness and is far from giving the end user all the desired functionality.

GRID

...

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Grid AND/OR Cluster Client

GRID ARCHITECTURE

“Meta Scheduler” enabled Grid - 1• An approach to overcome the daunting task of integrating clusters under a single view.

• Meta schedulers provide a “Resource Manager” translation layer to access clusters.

• Provides a unified language for accessing the clusters to the Grid “Client”.

Meta Scheduler Enabled Grid

Meta Scheduler

...

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Grid AND/OR Cluster Client Cluster Client

GRID ARCHITECTURE

“Meta Scheduler” enabled Grid - 2• Advantages: Simple and powerful solution.

• Disadv.: No granular control, depends on resource manager to execute anything on the cluster.

• Samples: Globus “GridWay”, “Community Scheduler Framework”, “Grid Wizard”

Meta Scheduler Enabled Grid

Meta Scheduler

...

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Grid AND/OR Cluster Client Cluster Client

GWE - “GRID WIZARD ENTERPRISE”

Introduction

• Why? Because you do not care about the grid and you just want to run your jobs faster!

• Complete rework of “Grid Wizard” meta scheduler.

• It is more a “Grid Virtualization Platform”.

• Java based distributed system.

• Collection of autonomous back end applications running in clusters head and compute nodes.

• Extremely easy integration, installation and usage.

• Provides single view to clusters and/or grid.

• Provides lots of granular control and features to grid “client”s.

• Provides auto configuration with most sensible and auto discovered values to minimize user input.

• Requirements: SSH enabled clusters, Java 1.5.

GWE - “GRID WIZARD ENTERPRISE”

Internal Components Interfaces• Two types:

a) To other GWE components.

b) To cluster specific components. These are built on top of a pluggable framework which allows third party providers to easily add support to other systems through customized drivers.

From otherGWE

Daemons

To otherGWE

Daemons

From GWE ClientsTo GWE Monitors

To ResourceManager

To/FromGWE Agents

To FileSystem

ToNetwork

“Grid Wizard Enterprise” Cluster Components

... ...

GWE - “GRID WIZARD ENTERPRISE”

“Grid Wizard Enterprise” enabled Cluster• Tight integration with cluster

internals and transparent access to them.

• Granular level of execution control: submission, pause, resume, abort.

• Real time monitoring and alerting capabilities.

• Granular reporting of historic, diagnostics and statistics data.

• Transparent environment translation for requests.

• Programmatic control with rich and simple API.

• Workflow capabilities (future).

Cluster Operating System

Network

Head Node Compute Nodes File System

Resource Manager

“Grid Wizard Enterprise” Cluster Components

... ...

GWE - “GRID WIZARD ENTERPRISE”

Distributed System - 1

• “Grid Wizard Enterprise” Cluster Components are designed to interface with other “Grid Wizard Enterprise” Cluster Components.

• This feature allows them to form a distributed network of one and/or many clusters “on the fly”.

• Grid ‘clients’ can form virtual views of these distributed network “on the fly” as they wish and have access to.

• Provides the features of a single GWE enabled cluster to “N” GWE enabled clusters by chaining them in a ring configuration which is:

• Dynamically updated in real time.

• Transparently self-managed.

• Customizable per user and per request.

GWE - “GRID WIZARD ENTERPRISE”

Distributed System - 2

GRID

...

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

“Grid Wizard Enterprise” Distributed System

GWE Components

... ...

GWE Components

... ...

GWE Components

... ...

...

GWE - “GRID WIZARD ENTERPRISE”

“Grid Wizard Enterprise” enabled Grid• Grid “Client” has to establish a link to any “Grid Wizard Enterprise Components” sitting on a cluster

to get a unified view of a grid and gain access to all the services “Grid Wizard Enterprise” provides over such grid.

“Grid Wizard Enterprise” Enabled Grid

...

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

Cluster Operating System

Network

HeadNode

ComputeNodes

FileSystem

Resource Manager

GWE Components

... ...

GWE Components

... ...

GWE Components

... ...

...

Grid AND/OR ClusterUser

Cluster UserGrid AND/OR Cluster

Monitor

GWE - “GRID WIZARD ENTERPRISE”

Architecture Overview

EmbeddedDatabase

Driver Manager Framework

FileSystemDrivers

NetworkProtocolDrivers

ResourceManager Drivers

Internal Processes

DAO + ORM Layer

External Processes

Persistent Model

RMI

Secured RCP Backbone

Serverfor

Agent

“Grid Wizard Enterprise" Cluster Components

GWE - “GRID WIZARD ENTERPRISE”

Architecture Overview

EmbeddedDatabase

Driver Manager Framework

FileSystemDrivers

NetworkProtocolDrivers

ResourceManager Drivers

Internal Processes

DAO + ORM Layer

External Processes

Persistent Model

RMI

Secured RCP BackboneServer for Daemon Client for Daemon

Serverfor

Client

Serverfor

Monitor

Serverfor

Agent

GWE - “GRID WIZARD ENTERPRISE”

Tool Integration• Out of the box, generic, command line integration using xml configuration files and VTL (to describe

jobs): http://velocity.apache.org/engine/devel/vtl-reference-guide.html

• VTL Job Descriptor Sample:

#set($path = "srbfile:/home/mruiz.ucsd-bcc/2d")

#foreach( $count in ["01", "02", "03", "04", "05", "06", "07"]) /opt/BIRN/lddmm/1.0.1/lddmm -A ${path}/Atlas.img -T ${path}/Patient${count}.img -d 2

#end

• Implementation of highly specialized integrations through rich java APIs:

• Client API: Provides interface for user to control requests execution.

• Monitor API: Provides interface to register interest in events and gather them in real time from distributed system.

GWE - “GRID WIZARD ENTERPRISE”

Drivers Integration• Plug-in bundles contain:

• Java binaries.

• Plug-in xml descriptors.

• Drivers categories (so far):

• Network Protocols. Samples: Local, SSH

• File Systems. Samples: Local, SFTP, SRB

• Resource Managers. Samples: Condor, SGE, PBS, Torque, LSF

GWE - “GRID WIZARD ENTERPRISE”

Schedule• December 2007

• Pre-release. Alpha version of complete infrastructure including automatic deployment, limited drivers (including limited SRB drivers) and limited monitoring.

• Tool Integration Migration: BIRN Portal LDDMM.

• February 2008

• Pre-release. Beta version.

• Tool Integration: Slicer 3 (embedded and generic).

• April 2008

• Production Release. Stable version with drivers for most popular cluster components (including full transparent SRB support), full monitoring, alert capabilities and API bundles.

• Tool Integration: BIRN Portal (embedded and generic).

• June 2008

• Tool Integration: FreeSurfer and FIPS (specialized and full featured).

GWE - “GRID WIZARD ENTERPRISE”

Proof of concept demo

Q&A