it monitoring wg monitoring use cases
DESCRIPTION
IT Monitoring WG Monitoring Use Cases. 16 January 2012. Introduction. Goal Analyze different types uses cases from all IT groups Identify few representative common uses cases Contribution requested under 3 categories Fast & Furious (FF ) alarms, end user views Digging Deep (DD ) - PowerPoint PPT PresentationTRANSCRIPT
Grid Technology
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
DBCFCFGT
IT Monitoring WG
Monitoring Use Cases
16 January 2012
Grid Technology Introduction
• Goal– Analyze different types uses cases from all IT groups– Identify few representative common uses cases
• Contribution requested under 3 categories– Fast & Furious (FF)
• alarms, end user views– Digging Deep (DD)
• infrequent analysis with lots of data, history analysis– Correlate & Combine (CC)
• combining data from different domains
• Contributions received from 7 groups – Input from DB and DSS missing
Grid Technology Fast & Furious
Group Action Element
CF list, join, alarm nodes, exceptions
CIS list, alarm document queue, web servers
CS list, alarm router, switch, network
DI list connections, urls, external locations
ES list, find job status, transfer status, site status
GT list, find services, sites
PES list, join, alarm users, batch jobs, hardware
https://tomtools.cern.ch/confluence/display/MWG/Use+Cases
Grid Technology Fast & Furious
• Groups:– CF, DI, PES, ??
• Role: – Sys Admin, Service Manager
• Tasks:– Get metrics values for hardware and selected services– Filter metrics per different types (role, cluster, etc)– Aggregate exceptions and errors– Raise alarms according to appropriate thresholds
Grid Technology Digging Deep
Group Action Element
CF report, reorder historical data
CIS report, statistics historical data
CS correlate traffic, route, configuration
DI correlate ip addresses, devices
ES statistics, reorder historical data
GT statistics service status, service availability
PES statistics historical data, CPU usage, disk usage
https://tomtools.cern.ch/confluence/display/MWG/Use+Cases
Grid Technology Digging Deep
• Groups:– CF, CS, ES, PES
• Role:– VO Admin, Service Manager
• Tasks:– Curation of hardware and network historical data– Analysis and statistics (trends) on batch job data, and
network data
Grid Technology Correlate & Combine
Group Action Element
CF correlate raw data, alarms, cluster, service, app
CIS correlate CDS, INSPIRE, usage, AFS
CS correlate high traffic, firewall load, hardware problems
DI correlate p2p problems, outgoing connections, IPs
ES correlate job/transfer rate/metadata, site status, fts, srm
GT correlate service status, job status
PES correlate alarms, metrics, hardware, users, services
https://tomtools.cern.ch/confluence/display/MWG/Use+Cases
Grid Technology Correlate & Combine
• Groups: – CF, CIS, ES, GT, PES, ??
• Role: – Service Manager
• Tasks:– Correlation between alarms, hardware, and services– Correlation between usage, hardware, and services– Correlation between job status and grid status