Transcript
Page 1: Xrootd Monitoring for the CMS Experiment Abstract: During spring and summer 2011 CMS deployed Xrootd front- end servers on all US T1 and T2 sites. This

Xrootd Monitoring for the CMS Experiment

Abstract: During spring and summer 2011 CMS deployed Xrootd front-end servers on all US T1 and T2 sites. This allows for remote access to all experiment data and is used for user-analysis, visualization, running of jobs at T2s and T3s when data is not available at local sites, and as a fail-over mechanism for data-access in CMSSW jobs.Monitoring of Xrootd infrastructure is implemented on three levels:1. Service and data availability checks Nagios@UNL⟼2. Xrootd summary monitoring Custom analyzer MonALISA⟼ ⟼3. Xrootd detailed monitoring ⟼ GLED Web, Gratia, ROOT Trees, …⇶

L.A.T. Bauerdick1, K.Bloom3, B.P.Bockelman3, D.C.Bradley4, S.Dasu4, I.Sfiligoi2, A.Tadel2, M.Tadel2, F.Wuerthwein2, A.Yagil2

UCSD

Caltech

UNL Wisconsin

FNALPurdue

UFL

MIT

1 FNAL, 2 UC San Diego, 3 University of Nebraska-Lincoln, 4 University of Wisconsin-Madison

#beginunique_id=xrd-1335314898281000file_lfn=/store/data/Run2011B/…/XXXX.rootfile_size=1441178626start_time=1335314780end_time=1335314898read_bytes=840772088read_operations=196read_min=300read_max=25090476read_average=4289653.510204read_sigma=8040665.338339# single-read operation statistics removedread_vector_bytes=836030346read_vector_operations=64read_vector_min=132030read_vector_max=25090476 read_vector_average=13062974.156250 read_vector_sigma=9146182.440509 read_vector_count_min=3read_vector_count_max=512 read_vector_count_average=327.468750 read_vector_count_sigma=179.357096read_bytes_at_close=840772088# write operation statistics removeduser_dn=XXXXuser_vo=user_role=user_fqan=client_domain=hep.wisc.educlient_host=g22n10server_username=cmsuser127app_info=server_domain=t2.ucsd.eduserver_host=uaf-7#end

References:• AAA & FAX, at this CHEP• GLED http://gled.org/• MonALISA http://monalisa.caltech.edu/• ROOT http://root.cern.ch/• Xrootd http://xrootd.org/

1. Service & Data AvailabilityNagios probes track the following core operations:• Check redirection from meta-manager@UNL sites⟼• Check authentication with CERN & OSG certificates• Check that files can actually be read (get first 1kB)• Mail alarms sent in case of problemsChecking of individual Xrootd servers:• Some sites also use Nagios@UNL (historically)

The plan is to delegate this to sites (RSV probes exist)• Summary monitoring also reveals a lot about server state

2. Xrootd Summary MonitoringAll redirectors and servers send their summary monitoring UDP packets to a collector at UCSD where data is pre-processed and stored into MonALISA repository.Examples of collected data:• Number of connected clients• Rates of new connections, authentications, and various errors• Incoming and outgoing network traffic caused by Xrootd• Server’s usage of system resourcesProcessing with ML plugins:• Calculating per-site quantities, e.g. total traffic for each site• Detecting error conditions and sending notification emailsPresentation options:• Standard ML graphs – for individual sites / host, totals• Dashboard

UDP TCP multiplexer➙

GLED

TTree writer

MonALISA

xrd-rep-snatcher.pl

Development, testing

SummaryUDP packets

DetailedUDP packets

3. Xrootd Detailed MonitoringAs with summary data, detailed monitoring UDP packets are also sent to UCSD. The streams are merged and made available via a UDP to TCP converter / multiplexer.Contents of detailed monitoring streams:• User authentication records, including their DN and VOMS info• File-open records, including LFN by which the file was requested• All read and write requests (offset, length, and timestamp)• Vector-read requests (# of elements, total length, timestamp)

Optionally, servers can send offset & length info for each element• Redirection recordsDefault processing with GLEDComplete in-memory representation of all servers, sessions and open files is required as packets are highly encoded.• Embedded http server shows currently ongoing user sessions• When a file is closed a detailed report is generated

Sent to OSG Gratia and written into ROOT trees for further analysis

Top Related