sergey belov, lit jinr 15 september, nec’2011, varna, bulgaria

15
Monitoring for GridNNN project Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

Post on 21-Dec-2015

224 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

Monitoring for GridNNN project

Sergey Belov, LIT JINR

15 September, NEC’2011, Varna, Bulgaria

Page 2: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring

Grid support for nationalnanotechnology networkof Russia◦ To provide for science and industry an effective access to the

distributed computational, informational and networking facilities

◦ Expecting breakthrough in nanotechnologies◦ Supported by the special federal program

Main technical points◦ based on a network of supercomputers (about 15-30)◦ has two grid operations centers (main and backup)◦ is a set of grid services with unified interface◦ partially based on Globus Toolkit 4

2/15

GridNNN project (I)

Page 3: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring

Main aim◦ integration of small and medium supercomputers into a

unified distributed computing environment Highly heterogeneous grid environment

(hardware, software) Oriented to parallel tasks rather than single batch

tasks Workflow management

◦ Jobs consist of tasks Follows core OGSA principles GSI based security model RESTful grid services

3/15

GridNNN project (II)

Page 4: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring 4/15

GridNNN architecture layers

Based on the report of A.Kryukov et al., Architecture of GridNNN, GRID’2010

Page 5: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring

WebUI server Resource Brocker/metascheduler +

Workflow management (RESTful) Information Service (RESTful / WS MDS) Monitoring & Accounting Registration service (RESTful) GSI services

◦ CA, MyProxy, VOMS GridFTP servers

5/15

Core grid services

Based on the report of A.Kryukov et al., Architecture of GridNNN, GRID’2010

Page 6: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring

State of sites and services◦ Availability◦ Real operational state

Monitoring of user's jobs and tasks Keeping history on different system's

parameters Information representation

◦ General infrastructure state in whole◦ Running jobs and tasks◦ Separate sites and services (real-time and history)◦ Visualization of job events

6/15

Monitoring goals

Page 7: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring

State of computational resources by site (based on data from information index(es))

Slots available for tasks Jobs (total on site), jobs belong to GridNNN Structure and properties of clusters

◦ Subclusters, nodes, slots, operation system, architecture

◦ Application software◦ Supported VOs (with ACLs, Access Control Lists)

Monitoring of jobs running on sites (by information from Pilot servers)

7/15

Monitoring of resources

Page 8: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring

Goal: checks of services' operation Simple tests for services registered in Service for

Registration of Resources and Services Connection to the declared port of the machine

(plane or secured — in depend of specified protocol)

Information requests to some services Separate tests scenarios for MDS information

indexes and Service for Registration of Resources and Services: information

Web page with the history of functional tests results

8/15

Simple functional tests of services

Page 9: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring

Goal: to get information, both real-time and historical, on resources utilization and jobs running on GridNNN infrastructure (by users, VOs, sites)

Information sources: Pilot servers, GRAMs and local resources managers

Collecting data on jobs and tasks in the system◦ All jobs events timestamps, real consumed CPU time

Accounting information reports in different views:◦ by sites, VOs and single users

Aggregation of actual job's execution time from all sites

9/15

Accounting and job monitoring

Page 10: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring

Gathering statistics on CPUtime consumed by usersand VOs◦ In plain hours, later with allowance

of computational system productivity Displaying the statistics of CPU resources

usage◦ Different report kinds: for user, VO manager, site

admin, GridNNN project admins◦ Statistics access roles to protect private

information of users and VOs

10/15

GridNNN accounting

Page 11: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring 11/15

Accounting and jobs monitoring: screenshots

And

rey

Dem

iche

v

Eyg

ene

Rya

bink

in

And

rey

Kiry

anov

Ale

xey

Tar

asov

Tar

as S

hapo

valo

v

Lev

Sha

mar

din

Mik

alai

Kut

ousk

i

Ilya

Gor

buno

v

Ale

xand

r P

ivus

hkov

Eic

Dus

hano

v

Ale

xey

Shm

elki

n

Nik

olay

Prik

hodk

o

Gre

gory

Shp

iz

Nat

alia

Chi

rska

ya

Ser

gey

Mal

kovs

ky

оста

льны

е

0

10000

20000

30000

40000

50000

60000

Запуск заданий пользователями

Всего заданий: 106990Пользователей с сертификатами: 44, активных: 33

Завершено успешно

Завершено с ошибкой

Page 12: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring 12/15

Monitoring and accounting information flows

Monitoring andaccounting

datastorage

Informationcollector

PilotJob

managementservices

Monitoringwebsite

Monitoring dataprovisioning

(Web Services)

AccountingInformationpublisher

Functional testsof the services

Infosyscentral

Informationindex

Page 13: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring

More than 15 resource centers at the moment in different regions of Russia◦ RRC KI, «Chebyshev» (MSU), IPCP RAS, CC FEB RAS,

ICMM RAS, JINR, SINP MSU, PNPI, KNC RAS, SPbSU, SPII RAS and others

13/15

GridNNN centers on the map

http://mon.ngrid.ru

Page 14: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring 14/15

Infrastructure operation visualization with Google Earth

Page 15: Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria

S. Belov, GridNNN monitoring

GridNNN project was successfully finished this summer

The resulting software and created infrastructure are to be used for developing Russian Grid Network project

Fully operational monitoring and accounting tools are in production

Further user interfaces improvements are planned within Russian Grid Network project

15/15

Conclusion