mon acc ccr workshop

39
Fabric Monitor, Accounting, Storage and Reports experience at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 [email protected] Workshop sul calcolo e reti INFN - Otranto - 8-6- 2006

Upload: fnian

Post on 10-May-2015

922 views

Category:

Business


0 download

DESCRIPTION

Fashion, apparel, textile, merchandising, garments

TRANSCRIPT

Page 1: Mon Acc Ccr Workshop

Fabric Monitor, Accounting, Storage and

Reports experience at the INFN Tier1

Felice Rosso on behalf ofINFN Tier1

[email protected]

Workshop sul calcolo e reti INFN - Otranto - 8-6-2006

Page 2: Mon Acc Ccr Workshop

Outline• CNAF-INFN Tier1• FARM and GRID Monitoring• Local Queues Monitoring

– Local and GRID accounting

• Storage Monitoring and accounting• Summary

Page 3: Mon Acc Ccr Workshop

Introduction• Location: INFN-CNAF, Bologna (Italy)

–one of the main nodes of GARR network• Computing facility for INFN HNEP

community–Partecipating to LCG, EGEE, INFNGRID projects

• Multi-Experiments TIER1–LHC experiments (Alice, Atlas, CMS, LHCb)–CDF, BABAR–VIRGO, MAGIC, ARGO, Bio, TheoPhys, Pamela ...

• Resources assigned to experiments on a yearly Plan.

Page 4: Mon Acc Ccr Workshop

The Farm in a Nutshell - SLC 3.0.6, LCG 2.7, LSF 6.1

- ~ 720 WNs LSF pool (~1580 KSI2K)- Common LSF pool: 1 job per logical cpu (slot)

- MAX 1 process running at the same time per job

– GRID and local submission allowed• On the same WN can run GRID and not GRID jobs• On the same queue can be submitted GRID and not GRID jobs

– For each VO/EXP one or more queues– Since 24th of April 2005 2.700.000 jobs were

executed on our LSF pool (~1.600.000 GRID)– 3 CEs (main CE 4 opteron dualcore, 24 GB RAM) + 1 CE

gLite

Page 5: Mon Acc Ccr Workshop

Access to Batch system

“Legacy” non-Grid Access

CE LSF Wn1 WNn

SE

Grid Access

LSFclient

UIUI UI

Grid

Page 6: Mon Acc Ccr Workshop

Farm Monitoring Goals• Scalability to Tier1 full size• Many parameters for each WN/server• DataBase and Plots on Web Pages• Data Analysis• Report problems on Web Page(s)• Share data with GRID tools• RedEye: INFN-T1 tool monitoring• RedEye: simple local user. No root!

Page 7: Mon Acc Ccr Workshop

Tier1 Fabric MonitoringWhat do we get?• CPU load, status and jiffies• Ethernet I/O, (MRTG by net-boys)• Temperatures, RPM fans (IPMI)• Total and type of active TCP connections• Processes created, running, zombie etc• RAM and SWAP memory• Users logged in• SLC3 and SLC4 compatible

Page 8: Mon Acc Ccr Workshop

Tier1 Fabric Monitor

Page 9: Mon Acc Ccr Workshop

Local WN Monitoring• On each WN every 5 min (local crontab) infos are

saved locally (<3KBytes --> 2-3 TCP packets)• 1 minute later a collector “gets” via socket the infos

– “gets”: tidy parallel fork with timeout control

• To get and save locally datas from 750 WN ~ 6 sec best case. 20 sec worst case (timeout knife)

• Upgrade DataBase (last day, week, month)• For each WN --> 1 file (possibility of cumulative plots)

• Analysis monitor datas• Local thumbnail cache creation (web clickable)• http://collector.cnaf.infn.it/davide/rack.php• http://collector.cnaf.infn.it/davide/analyzer.html

Page 10: Mon Acc Ccr Workshop

Web Snapshot CPU-RAM

Page 11: Mon Acc Ccr Workshop

Web Snapshot TCP connections

Page 12: Mon Acc Ccr Workshop

Web Snapshot users logged

Page 13: Mon Acc Ccr Workshop

Analyzer.html

Page 14: Mon Acc Ccr Workshop

FabricGRID Monitoring• Effort on exporting relevant fabric metrics

to the Grid level e.g.:– # of active WNs, – # of free slots, – etc…

• GridICE integration– Configuration based on Quattor

• Avoid duplication of sensors on farm

Page 15: Mon Acc Ccr Workshop

Local Queues Monitoring• Every 5 minutes on batch manager is saved

queues status (snapshot)• A collector gets the infos and upgrades the

local database (same logic of farm monitoring)– Daily / Weekly / Monthly / Yearly DB– DB: Total and single queues

• 3 classes of users for each queue• Plots generator: Gnuplot 4.0• http://tier1.cnaf.infn.it/monitor/LSF/

Page 16: Mon Acc Ccr Workshop

Web Snapshot LSF Status

Page 17: Mon Acc Ccr Workshop

UGRID: general GRID user (lhcb001, lhcb030…)SGM: Software GRID Manager (lhcbsgm)OTHER: local user

Page 18: Mon Acc Ccr Workshop

UGRID: general GRID user (babar001, babar030…)SGM: Software GRID Manager (babarsgm)OTHER: local user

Page 19: Mon Acc Ccr Workshop

RedEye - LSF Monitoring• Real time slot usage • Fast, few CPU power needed, stable, works on WAN • RedEye simple user, not root

BUT…1. all slots have the same weight (Future: Jeep solution)2. Jobs shorter than 5 minutes can be lost

SO:We need something good for ALL jobs.

We need to know who and how uses our FARM.

Solution:Offline parsing LSF log files one time per day (Jeep integration)

Page 20: Mon Acc Ccr Workshop

Job-related metricsFrom LSF log file we got the following non-GRID info:

• LSF JobID, local UID owner of the JOB• “any kind of time” (submission, WCT etc)• Max RSS and Virtual Memory usage• From which computer (hostname) the job was

submitted (GRID CE/locally)• Where the job was executed (WN hostname)• We complete this set with KSI2K & GRID infos

(Jeep)• DGAS interface http://www.to.infn.it/grid/accounting/main.html

• http://tier1.cnaf.infn.it/monitor/LSF/plots/acct/

Page 21: Mon Acc Ccr Workshop

Queues accounting report

Page 22: Mon Acc Ccr Workshop

Queues accounting report

Page 23: Mon Acc Ccr Workshop

Queues accounting report

Page 24: Mon Acc Ccr Workshop

Queues accounting report

• KSI2K [WCT] May 2006, All jobs

Page 25: Mon Acc Ccr Workshop

Queues accounting report

• CPUTime [hours] May 2006, GRID jobs

Page 26: Mon Acc Ccr Workshop

How we use KspecINT2K?- 1 slot → 1 job- http://tier1.cnaf.infn.it/monitor/LSF/plots/ksi/

- For each job:

Page 27: Mon Acc Ccr Workshop

KSI2K T1-INFN Story

Page 28: Mon Acc Ccr Workshop

KSI2K T1-INFN Story

Page 29: Mon Acc Ccr Workshop

Job Check and Report• Lsb.acct had a big bug!

– Randomly: CPU-user-time = 0.00 sec– From bjobs -l <JOBID> correct CPUtime– Fixed by Platform at 25th of July 2005

• CPUtime > WCT? --> Possible Spawn• RAM memory: is a job on the right

WN?• Is the WorkerNode a “black hole”?• We have a daily report (Web page)

Page 30: Mon Acc Ccr Workshop

Fabric and GRID monitoring

• Effort on exporting relevant queue and job metrics to the Grid level. – Integration with GridICE– Integration with DGAS (done!)– Grid (VO) level view of resource usage

• Integration of local job information with Grid related metrics. E.g.:– DN of the user proxy– VOMS extensions to user proxy– Grid Job ID

Page 31: Mon Acc Ccr Workshop

GRID ICE

• Dissemination http://grid.infn.it/gridice

• GridICE server (development with upcoming features)

• http://gridice3.cnaf.infn.it:50080/gridice

• GridICE server for EGEE Grid

• http://gridice2.cnaf.infn.it:50080/gridice

• GridICE server for INFN-Grid

• http://gridice4.cnaf.infn.it:50080/gridice

Page 32: Mon Acc Ccr Workshop

GRID ICE• For each site check GRID services (RB, BDII,

CE, SE…)• Check service--> Does PID exist?• Summary and/or notification• From GRID servers: Summary CPU and

Storage resources available per site and/or per VO

• Storage available on SE per VO from BDII• Downtimes

Page 33: Mon Acc Ccr Workshop

GRID ICE

• Grid Ice as fabric monitor for “small” sites

• Based on LeMon (server and sensors)• Parsing of LeMon flatfiles logs• Plots based on RRD Tools• Legnaro: ~70 WorkerNodes

Page 34: Mon Acc Ccr Workshop

GridICE screenshots

Page 35: Mon Acc Ccr Workshop

Jeep• General Purpose collector datas (push

technology)• DB-WNINFO: Historical hardware DB (MySQL

on HLR node).• KSI2K used by each single job (DGAS)• Job Monitoring (Check RAM usage in real time,

efficiency history)• FS-INFO: Enough available space on volumes?• AutoFS: all dynamic mount-points are working?• Match making UID/GID --> VO

Page 36: Mon Acc Ccr Workshop

The Storage in a Nutshell• Different hardware (NAS, SAN, Tapes)

– More than 300 TB HD, 130 TB Tape• Different access methods (NFS/RFIO/Xrootd/gridftp)

• Volumes FileSystem: EXT3, XFS and GPFS• Volumes bigger than 2 TBytes: RAID 50

(EXT3/XFS). Direct (GPFS)• Tape access: CASTOR (50 TB of HD as stage)• Volumes management via Postgresql DB• 60 servers to export FileSystems to WNs

Page 37: Mon Acc Ccr Workshop

Storage at T1-INFN• Hierarchical Nagios servers to check services

status– gridftp, srm, rfio, castor, ssh

• Local tool to sum space used by VOs• RRD to plot (volume space total and used)• Binary and owner (IBM/STEK) software to check

some hardware status.• Very very very difficult to interface owner

software to T1 framework• For now: only e-mail report for bad blocks, disks

failure and FileSystem failure• Plots: intranet & on demand by VO

Page 38: Mon Acc Ccr Workshop

Tape/Storage usage report

Page 39: Mon Acc Ccr Workshop

Summary• Fabric level monitoring with smart report is

needed to ease management• T1 has already solution for 2 next years!• Not exportable due to man-power (no

support)• Future at INFN? What is T2s man-power?• LeMon&Oracle? What is T2s man-power?• RedEye? What is T2s man-power?• Real collaboration is far from mailing list

and phone conferences only