1 nordunet conferencegrid monitoring : paryavekshanam 9 th april 2008 paryavekshanam status...

Post on 27-Mar-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1 NORDUnet conference Grid Monitoring : Paryavekshanam9th April 2008

PARYAVEKSHANAM STATUS MONITORING TOOL

forINDIAN National Grid: GARUDA

Karuna Karunap@cdacb.ernet.in

Co-authors: Deepika H.V.,Mangala N., Prahlada Rao BB, MohanRam N.

System Software Development Group, Center for Development of Advanced

Computing(C-DAC), BangaloreINDIA

29th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

• GARUDA Overview• GARUDA Architecture• Monitoring Requirements• Paryavekshanam Objectives• Paryavekshanam Architecture• Paryavekshanam Features• Alert and Notification system• Conclusion

Presentation Plan

39th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Indian National Grid: GARUDA

• GARUDA is initiated by C-DAC, and is funded by Dept. of Information Technology, Govt. of India.

• GARUDA provides an amalgam of advanced capabilities to enable increasingly interdisciplinary scientific environments required to solve complex problems.

• GARUDA connects 45 national research and academic institutions, across 17 cities/locations in India.

• GARUDA is used by applications communities such as Weather / Climate Modeling, Disaster Management, and Bio-informatics.

49th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

GARUDA Grid : Key Features

• Geographically distributed resources across 17 cities and 45 research institute and academia

• Resources are dynamic and Heterogeneous in nature (Linux, Solaris, AIX)

• Resources are under various administrative domains

• Network backbone is of 2.43GB, 10/100 Mbps BW links from point –point.

• GARUDA middleware - Globus 2.x

• Multi-institutional Virtual Organization

59th – 11th April 2008

24th NORDUnet conference

IGIB Linux

Submit node gridfs

Cluster Head Node

Compute Nodes

Bangalore

GARUDA HeadNode

Cluster Head Node

Cluster Head Node

ChennaiLinux

C-DAC Bangalore AIX

Cluster Head Node

Cluster Head Node

Compute Nodes

PuneLinux

RRI-Bangalore Linux

C-DAC (Hyd) Linux

GARUDA Grid Architecture

Cluster Head Node

69th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Management & Monitoring

• Paryavekshanam

Resources

• Compute, Data Storage

• Scientific Instruments

• Softwares

Resource Mgmt & Scheduling

• Moab from Cluster Resources

• Load Leveler, Torque

• Globus 2.x

Application (PoC)

• Disaster Management

• Bioinformatics

• Climate modeling

Access Methods

• Access Portal

• Problem Solving Environments

Data Management

• Storage Resource Broker

Development Environment

• DIViA for Grid

• GridIDE

GARUDA Components

79th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

• Ethernet based High BW capacity of Layer 2/3 MPLS VPN

• Scalable over entire geographic area

• High levels of reliability

• Fault tolerance and redundancy

• High security

• Effective Network Management

GARUDA Network Fabric Features

89th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

GARUDA Resources

• C-DAC Centers are contributing computing resources at: Bangalore , Pune, Chennai, and Hyderabad

• HPC systems from partner sites.

• Total processor > 600• Aggregated compute

power = 3.5 TFlops• Satellite terminals from

SAC Ahmedabad• Grid Labs at Bangalore,

Pune, Hyderabad

99th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

GARUDA Resources conti..

109th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Institute of Plasma Research, Ahmedabad Physical Research Laboratory, Ahmedabad Space Applications Centre, Ahmedabad Harish Chandra Research Institute, Allahabad Motilal Nehru National Institute of Technology, Allahabad Raman Research Institute, Bangalore National Center for Biological Sciences Indian Institute of Astrophysics, Bangalore Indian Institute of Science, Bangalore Institute of Microbial Technology, Chandigarh Punjab Engineering College, Chandigarh Madras Institute of Technology, Chennai Indian Institute of Technology, Chennai Institute of Mathematical Sciences, Chennai ERNET, Delhi Indian Institute of Technology, Delhi Jawaharlal Nehru University, Delhi Institute for Genomics and Integrative Biology, Delhi Indian Institute of Technology, Guwahati Guwahati University, Guwahati

GARUDA Partners

119th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

University of Hyderabad, Hyderabad Centre for DNA Fingerprinting and Diagnostics, Hyderabad Jawaharlal Nehru Technological University, Hyderabad Indian Institute of Technology, Kanpur Indian Institute of Technology, Kharagpur Saha Institute of Nuclear Physics, Kolkatta Central Drug Research Institute, Lucknow Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow Bhabha Atomic Research Centre, Mumbai Indian Institute of Technology, Mumbai Tata Institute of Fundamental Research, Mumbai IUCCA, Pune National Centre for Radio Astrophysics, Pune National Chemical Laboratory, Pune Pune University, Pune Indian Institute of Technology, Roorkee Regional Cancer Centre, Thiruvananthapuram Vikram Sarabhai Space Centre, Thiruvananthapuram Institute of Technology, Banaras Hindu University, Varanasi

GARUDA Partners conti..

129th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

GARUDA Grid Monitoring- Purpose

• Detect, record, and report faults and service degradations

• Ensure GARUDA operates optimally

• Check Status availability & usage of grid resources

• Monitoring data repository for developers and Admin for Troubleshooting, Scheduling, Performance tuning and Analysis.

139th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Monitoring Requirements: GARUDA

• Needed a simple and easy to use tool

• Able to handle different users perspective

• Information should be readily available

• Should have more graphical views

• Should produce relevant and accurate timely data

• Diagnose the problems of GARUDA Environment

149th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Paryavekshanam: Monitoring Tool

• GARUDA is monitored by PARYAVEKSHANAM

• PARYAVEKSHANAM in Sanskrit means “Supervision”

• PARYAVEKSHANAM is a web-based user-friendly grid monitoring tool to monitor GARUDA Grid’s health to enhance the reliability, usability and manageability.

• PARYAVEKSHANAM is scalable and can be deployed on platforms like AIX, Linux and solaris.

• It assists users in resource allocation/selection through various GARUDA tools like G-IDE.

159th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Components Monitored by Parya..• Computing nodes

• Network

• Grid middleware

• Submitted jobs

• Software

• Storage and Storage Resource Broker

• Scientific Instruments

169th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Paryavekshanam Architecture

• Client server architecture with pull model having a centralized server

• Resource - everything connected to grid• Headnode – is the contact node of clusters• Four components:

– Information generator– Information Receiver– Information Repository– Paryavekshanam Visualizer

179th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Paryavekshanam Architecture

189th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

• Information Generator– Daemon resides on cluster

Headnodes – Collects the cluster details

and creates the data collection.

– Data collection is processed using the MDS schema and populated into Globus MDS

Paryavekshanam Architecture (Conti..)

• Information Receiver– Daemon that resides on the

monitoring server. – requests Information

Generator to produce the Data collection and fetches it from Globus MDS

• Information Repository– The data collection obtained

from Globus MDS is processed and stored in the Information Repository.

– It resides on the monitoring server

– It has mirror repository for providing the fault tolerance

• Paryavekshanam Visualizer– User friendly Graphical User

Interface– It retrieves data from

Information Repository and displays through well-structured graphs and tables

– Visualizer helps in diagnosing the problem areas.

199th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Paryavekshanam Features• Hierarchical drill down of information • Bird’s eye view of Grid Health through Radar Graph• Dashboard providing the top level view• Status bar for quick and action oriented insights • Alerts generation through emails• Easy Interface for New site addition• Multiple Views: Grid, Nodes, GOC and Network views• Visualization of data in tabular and graphical format• ‘Data Gallery’ for analysis of historical data• Search facility for resources, software stack and jobs• Separate resolution for GOC monitoring

209th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Dashboard of Paryavekshanam

GARUDA Connected cities on India Map Status

Bar

Bird’s eye view of Grid Health through Radar Graph

Grid Strength

219th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Dashboard of Paryavekshanam Conti..

Radar Graph• Compare performance of different entities on axes starting from same point• Easy inference of utilization of quantitative parameters• Uniform utilization of various parameters can be inferred from the radar graphs.• Provides the glimpse of deviation from Ideal scenario.

Grid Strength• Defines health of grid and mathematically derived from radar graphs parameters• It is % representation on the dashboard• Colored bullets for representing different values of grid strength

Globus Strength : Monitoring Globus Strength based on empirical formula.

Status Bar gives the instantaneous up/down status can be drilled down further.

229th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Alert & Notification system:AlNotis• Paryavekshanam captures errors generated in the grid such as

failures of link, cluster, node, grid middleware and jobs through AlNotis

• Provides more visibility into the health of the system• Any failure or breakdown of resources needs to be captured

and notified• Necessary for corrective actions• Whenever any error occurs,

generates Error emails• Sends Warning emails when

utilization crosses threshold level

• Well-defined Escalation procedure– Unattended errors after 48

hrs is sent to grid admins

239th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Error Message Description

Description Error Code

Error Condition

Network link down

eLNK pkt loss 100%

Cluster down eCLS HeadNode status down

Node Down eNOD Node Status down

Globus Component

eGLB Component fail

Jobs not running

eJOB total jobs>0, RJ =0

Warning Message Description

Description Warning Code

Warning Condition

Utilization of CPU

wCPU Threshold reached (cpu load >=1)

Utilization of memory

wMEM Threshold reached (mem utli >= 80%)

Bandwidth Utilization

wB/W Threshold reached (b/w utli >= 90%)

Alert & Notification system conti..

249th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

AlNotis tabulation

showing the error id, date & time the error generated, effected resources and time taken to close the ticket.

Alert error messages generated during the last

6 months.

Alert & Notification system conti..

259th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

GOC Desk : Parya..

• Grid Operation Center (GOC) help Desk built for GARUDA monitoring with State of art Wall Display

• GOC is responsible for monitoring of the Grid Infrastructure as a whole.

• GOC operates in four regional areas and centrally reporting to the GOC at Bangalore

• Apart from monitoring through Paryavekshanam it coordinates it activities through video conferencing

269th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

GOC Desk Page

• GOC Desk page mainly used daily monitoring

• Provides overall performance of parameters like BW utilization etc for 24 hrs

• Each graph is a hyperlinked to details of that parameter for the respective grid center.

• Additional table for reading accurate value on graphs.

279th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

GOC Desk Page conti..

289th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Grid Overview Page: Parya..

• It summarizes the performance of the entire grid for users.

• Provides information of all the parameters for all the centers in a tabular format

• It can be drilled down to fetch center resource details as Node level Summary

• It monitors the middleware components that provide detailed status summary for error resolving.

• It lists all the software available on the clusters.• Helps in knowing which components of Globus are up.

299th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Grid Overview Page: Parya..

309th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Nodes view & Globus component status

GSIFTP service is not available

319th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Software packages installed at headnodes

329th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Network Info Page: Parya..

• Routers and switches are monitored• Displays the bw avail, bw used, pkt loss, RTT

and link status • The report generation facility helps in

maintaining the SLA of RTT, Pkt loss, Circuit uptime on monthly basis

• Monitors the operation of network on 24x7x365 basis

339th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Network Info Page: Parya..

349th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

SRB Server status check

• Status of Storage Resource Broker is checked

• Space availability of storage servers

• Report generation in word and excel format

359th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Data Gallery Page: Parya..

• It archives data for reviewing the performance of the Grid in past

• Can view previous data both in tabular and graphical format

• Generates report for the duration selected.

369th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Search Page: Parya..

• Resource and software search is provided for user

• Resources can be searched based on os, memory, cpu speed etc

• Softwares can be searched on categories like debugger, libraries etc.

379th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

• Paryavekshanam tracks the progress of submitted jobs

• Shows the current status based on jobid

• Report of jobs based on users, status, job id, duration and running at clusters are available

Job search : Parya..

389th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

GARUDA Resource usage

- Resources are extensively used

- More than 100 registered users

- >600 cpus across 14 sites

- 65 TB data transferred on 2.43 GB backbone

399th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Admin Page: Parya.. • Paryavekshanam

adds the new sites and resources through simple interface

• Managed by access control

• Modification and deletion of sites supported

409th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Conclusion

• Successfully monitoring GARUDA from last 2 years

• Dashboard has been a very useful feature aggregating lots of information

• AlNotis system accelerates the speed of problem rectification

• Paryavekshanam overall improves the usability of GARUDA

41 NORDUnet conference Grid Monitoring : Paryavekshanam9th April 2008

Thank Q

429th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

Globus Strength

Each distinct value is indicative of the Globus status. It is having a value of 29 - summing up the individual distinct weights as shown below:

Major 4 pillars of globus

1. Security – 102. Job Submission – 83. Data Management – 74. Information Services – 4

--------------- 29

E.g. : Globus strength = 21

Result : Security, data mgmt, info services are up and Job submission is not possible.

439th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

The value 22 shows that Data Mgmt service is down

449th – 11th April 2008

24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM

GSIFTP service is not available

top related