it monitoring wg monitoring use cases

8
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/ DB CF CF GT IT Monitoring WG Monitoring Use Cases 16 January 2012

Upload: oriel

Post on 23-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

IT Monitoring WG Monitoring Use Cases. 16 January 2012. Introduction. Goal Analyze different types uses cases from all IT groups Identify few representative common uses cases Contribution requested under 3 categories Fast & Furious (FF ) alarms, end user views Digging Deep (DD ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: IT Monitoring WG Monitoring Use Cases

Grid Technology

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

DBCFCFGT

IT Monitoring WG

Monitoring Use Cases

16 January 2012

Page 2: IT Monitoring WG Monitoring Use Cases

Grid Technology Introduction

• Goal– Analyze different types uses cases from all IT groups– Identify few representative common uses cases

• Contribution requested under 3 categories– Fast & Furious (FF)

• alarms, end user views– Digging Deep (DD)

• infrequent analysis with lots of data, history analysis– Correlate & Combine (CC)

• combining data from different domains

• Contributions received from 7 groups – Input from DB and DSS missing

Page 3: IT Monitoring WG Monitoring Use Cases

Grid Technology Fast & Furious

Group Action Element

CF list, join, alarm nodes, exceptions

CIS list, alarm document queue, web servers

CS list, alarm router, switch, network

DI list connections, urls, external locations

ES list, find job status, transfer status, site status

GT list, find services, sites

PES list, join, alarm users, batch jobs, hardware

https://tomtools.cern.ch/confluence/display/MWG/Use+Cases

Page 4: IT Monitoring WG Monitoring Use Cases

Grid Technology Fast & Furious

• Groups:– CF, DI, PES, ??

• Role: – Sys Admin, Service Manager

• Tasks:– Get metrics values for hardware and selected services– Filter metrics per different types (role, cluster, etc)– Aggregate exceptions and errors– Raise alarms according to appropriate thresholds

Page 5: IT Monitoring WG Monitoring Use Cases

Grid Technology Digging Deep

Group Action Element

CF report, reorder historical data

CIS report, statistics historical data

CS correlate traffic, route, configuration

DI correlate ip addresses, devices

ES statistics, reorder historical data

GT statistics service status, service availability

PES statistics historical data, CPU usage, disk usage

https://tomtools.cern.ch/confluence/display/MWG/Use+Cases

Page 6: IT Monitoring WG Monitoring Use Cases

Grid Technology Digging Deep

• Groups:– CF, CS, ES, PES

• Role:– VO Admin, Service Manager

• Tasks:– Curation of hardware and network historical data– Analysis and statistics (trends) on batch job data, and

network data

Page 7: IT Monitoring WG Monitoring Use Cases

Grid Technology Correlate & Combine

Group Action Element

CF correlate raw data, alarms, cluster, service, app

CIS correlate CDS, INSPIRE, usage, AFS

CS correlate high traffic, firewall load, hardware problems

DI correlate p2p problems, outgoing connections, IPs

ES correlate job/transfer rate/metadata, site status, fts, srm

GT correlate service status, job status

PES correlate alarms, metrics, hardware, users, services

https://tomtools.cern.ch/confluence/display/MWG/Use+Cases

Page 8: IT Monitoring WG Monitoring Use Cases

Grid Technology Correlate & Combine

• Groups: – CF, CIS, ES, GT, PES, ??

• Role: – Service Manager

• Tasks:– Correlation between alarms, hardware, and services– Correlation between usage, hardware, and services– Correlation between job status and grid status