tier3 monitoring tf

10
Tier3 Monitoring TF Artem Petrosyan (JINR), Danila Oleynik (JINR), Julia Andreeva (CERN)

Upload: liang

Post on 06-Jan-2016

21 views

Category:

Documents


2 download

DESCRIPTION

Tier3 Monitoring TF. Artem Petrosyan (JINR), Danila Oleynik (JINR), Julia Andreeva (CERN). T3MON proposal (1/3). Finalized at the beginning of 2011. Registered as ATLAS note: http://cdsweb.cern.ch/record/1336119 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tier3 Monitoring TF

Tier3 Monitoring TF

Artem Petrosyan (JINR), Danila Oleynik (JINR), Julia Andreeva (CERN)

Page 2: Tier3 Monitoring TF

T3MON proposal (1/3)Finalized at the beginning of 2011. Registered as

ATLAS note: http://cdsweb.cern.ch/record/1336119

o «T3MON-SITE» - software suite for local site monitoring, based on Ganglia monitoring system

• Modules (plug-ins) for local resource management systems (LRMS) and storage systems

• Additional plug-ins development for Proof and xRootD• Aggregation and transmission summary data to central monitoring

o «T3MON-GLOBAL» - information system for aggregating and visualizing data from distributed Tier3 sites at a global VO

• Should be integrated with current ATLAS monitoring system (Dashboard)

Work is divided in two streams: validation of standard components and development.

05.04.11ATLAS Software & Computing Workshop

2

Page 3: Tier3 Monitoring TF

T3MON proposal (2/3)In order to validate T3MON-SITE for different T3 configurations, establishment of a work group at JINR was proposedTasks:

o Deployment of a test clustero Installation of batch systems and mass storage systems reported as

being used at Tier3 sites during T3 survey (various configurations)o Installation and configuration of data file monitoring and inventoryo Installation and configuration of Ganglia for a specific cluster setup

Installation and validation of the additional Ganglia plug-ins for monitoring metrics collection

o Preparation of installation and configuration instructionso Participation in the xRootD federation project within ATLAS

05.04.11ATLAS Software & Computing Workshop

3

Page 4: Tier3 Monitoring TF

T3MON proposal (3/3)• Milestones

o «T3MON-SITE» • Begin of June 2011: first prototype• Middle of July 2011- begin of September 2011: “Alfa” version• September 2011: stable version

o «T3MON-GLOBAL» • Begin of June 2011: complete the collection of system

requirements• August - September 2011: development and debugging of the

publishing agents• October – middle of November 2011: collecting data to the central

repository. Integration with the Dashboard monitoring systems• Middle of December 2011: a pilot version, collecting additional

information for implementation of the final version• February 2012 – March 2012: a final version.

05.04.11ATLAS Software & Computing Workshop

4

Page 5: Tier3 Monitoring TF

Team at JINRInvolved 4 specialists, 3 young employees, 2

software experts, several volunteers• Software

o Artem Petrosyano Danila Oleyniko Sergey Belovo Vladimir Vasilyev

• Installation and validationo Nikolay Kutovskiy

• Ignat Lensky, Ivan Kadochnikov, Anatoly Yakshov

• Software expertso Lucia Valova (Proof cluster administrator)o Pavel Dmitrienko (local monitoring system administrator/development)

05.04.11ATLAS Software & Computing Workshop

5

Page 6: Tier3 Monitoring TF

Testbed at JINR• Organized in February 2011• Multicore nodes• Virtualization

o 4 virtual clusters at the moment • PBS• xRootD• PROOF• OGE/SGE

o 3 clusters (PBS, xRootD, OGE/SGE) monitored by Ganglia

05.04.11ATLAS Software & Computing Workshop

6

Page 7: Tier3 Monitoring TF

StatusSoftware Test cluster Ganglia Development Documentati

on

xRootD + +

PROOF +

PBS (Torque)

+

OGE/SGE +

Condor LSF +

Lustre + - done + - in progress

05.04.11ATLAS Software & Computing Workshop

7

Page 8: Tier3 Monitoring TF

Plans• Setting up development infrastructure at CERN:

o Development nodeso Repository (SVN)o Common development framework with other application (Dashboard,

DQ2)o Twiki documentation

• xRootD & Proof plug-ins for Nagios (how to extend monitoring systems for sites which already use Nagios)

• Installation & validation: Condor, Lustre

05.04.11ATLAS Software & Computing Workshop

8

Page 9: Tier3 Monitoring TF

Open issues• Monitoring hooks in Athena• Collecting more information about list of metrics

to be presented on the global level• Information about delivery frequency to the

global level

05.04.11ATLAS Software & Computing Workshop

9

Page 10: Tier3 Monitoring TF

Summary• Proposal is prepared and issued• Work group is organized• Test infrastructure is set up at JINR• Documentation preparation is in process• Development of plug-ins is in process

05.04.11ATLAS Software & Computing Workshop

10