cms report – gridpp collaboration meeting ix peter hobson, brunel university4/2/2004 cms status...

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University 4/2/2004

CMS Status

Progress towards GridPP milestones Data management – the Data Challenge 2004 Batch analysis framework Monitoring using the EDG R-GMA middleware

The GRIDPP funded cast list Tim Barrass Barry MacEvoy Owen Maroney JJ “Henry” Nebrensky Hugh Tallini

Bristol, Brunel and Imperial College


CMS and LCG2 CMS Data Challenge DC04 has three components

Tier-0 challenge. Reconstruction at CERN• Complex enough. Doesn’t need grid per se

• But will publish catalog to CERN RLS service Distribution challenge. Push/Pull data to Tier-1’s

• Want to use LCG tools, can use SRB. Questions of MCAT/RLS coherence, SRB pool issues etc. Analysis/Calibration Aspects

• At Tier-1/2 centers (not at CERN during DC04 “proper”)

• Encourage use of LCG2 and GRID3 to run these

Aim to complete first two in “March” Expect last one to continue and be repeated over next 6 months

as LCG matures. Factorized from Tier0 and distribution challenges

CMS expert manpower is saturated with work for DC04.


DC04 Production Data Challenge: March 2004 nominal

An end-to-end test of the CMS offline computing system 25% of full world-wide system to be run flat-out for one month Key test of our Grid-enabled software components ‘Play back’ digitized data, emulating CMS DAQ -> storage,

reconstruction, calibration, data reduction and analysis at T0 & external T1

Some T2 involvement as “clients” of local T1 centres

T0 to T1 data transfer New transfer management database Refinements to schema CASTOR issues to be resolved (SRM export, 3Tb buffer needed)


DC04: Catalogue Deployment● Synchronised RLS (LRC & RMC) deployment

● Expect to have Oracle DB deployed at CERN and CNAF● Deployment at RAL and FZK could follow this

● May not be achieved on timescale of DC04● Cannot afford to plan on this being in place

● Tier 1s without RLS will need POOL MySQL catalogue● FNAL, RAL, Lyon, FZK● Catalogue should be updated by Tier 1 agent

● FCatalog tool copies POOL data from CERN RLS to local MySQL catalogue

● Catalogue updated as files are transferred from CERN


Catalogue Use● CERN RLS

● Initial registration of POOL data by reconstruction jobs● Registration of files and replicas in SRB by GMCat

● RAL, Lyon● Registration of files produced by analysis jobs ‘outside’ LCG-2

● Only files which are made “globally” accessible in an SRB or SRM server

● CNAF RLS● Replication of files to LCG-2 SE● Queries by LCG-2 analysis jobs ● Registration of files produced by LCG-2 analysis jobs


Distribution of Data from T1 to T2● LCG-2 sites

● Distribution through EDG replica manager● Registration in CNAF RLS● Jobs access data from CNAF RLS

● Non LCG-2 sites● Tier 2 can access POOL data from Tier 1 MySQL catalogue using

Catalo tools● Creates local catalogue – XML or local MySQL


SRB MCat Failover● Backup MCat server at Daresbury● Oracle failover solution to be installed soon

● Maintains mirror copy of Oracle backend between RAL and Daresbury

● If RAL MCat has problems will switch to Daresbury● No need to change SRB server or client● Minimal downtime of MCat

● Change in DNS registration

● GMCat service in deployment● Optimisation testing on local MySQL LRC & RLS


Catalogue Summary● RLS Catalogue deployment at CERN and CNAF still

expected● MCat server operational, backup service improved● Likely absence of RLS Catalogues at RAL, FZK

● Tier 1s to install local POOL MySQL Catalogues● agents to populate them

● Onward distribution to Tier 2s● Detailed configuration requires:

● How data is streamed?● To which Tier 2s?


Batch Analysis Framework

Gridified ORCA Submission System “GROSS”

Simple UI suitable for non-expert end user Extensible architecture (as requirements change/get

better defined for DC04 and beyond) No modification required to ORCA (transparent running on

Grid). No additional s/w required remotely Integrates directly to BOSS


COMMON BOSS/AF DATABASE

DATA INTERFACE

MONITORING MODULE

UI

JOB SUBMISSION MODULE

WN

RB

PHYSICS META-CATALOG

GRID

USER

Schematic Architecture

BOSS

“GROSS” System Design


How it worksUser submits to AF a single analysis TASK

which comprises:

User additionally specifies:

Submission module:

•ORCA executable•ORCA user libraries•Metadata catalogue query

•Which BOSS DB to use•Any additional DB to write output details to•Which metadata catalogue to query•What to do with output data and logs (in sandbox, register somewhere, etc). •Suffix for output filenames

•Makes data query on catalogue•Splits TASK into multiple JOBS (1 job per run)•Creates a JDL for each JOB•Creates wrapper script and steer file for each JOB•Submits each job (through BOSS)


Wrapping the ORCA job ORCA executable wrapper running on WN will

Set up appropriate ORCA environment Copy input sandbox/input data to working area Link to correct user libraries Run executable Deal with output files

Wrapper is shell script + steering file One (or many) standard shell script registered in db (but easy to

modify and re-register) Unique steering file created for each job by submission system


Data Handling Data handling part missing right now What we need

Definition of Physics Meta-Catalogue Ability to query this meta-catalogue to give

• List of GUIDs per run of data to be included in Grid submission JDL. This will direct where the job runs (Given that no movement of input data will take place – i.e. the job will always run where the data is).

Where to catalogue output data for group analysis (AF can handle writing to multiple DBs – e.g. writing to private local BOSS DB and to metadata cat.)


“GROSS” Summary Tested extensively on ORCA 7.5.0 + LCG-1 Now installed on LCG-2 UI at Imperial College Build extra functionality

Multiple DB support …

Most important missing piece: Data interface to meta-catalogue (ready to be plugged

into the rest of the framework).


RGMA+Boss: Overview CMS jobs are submitted via a UI node for data analysis. Individual jobs are wrapped in a BOSS executable. Jobs are then delegated to a local batch farm managed by

(for example) PBS. When a job is executed, the BOSS wrapper spawns a

separate process to catch input, output and error streams. Job status is then redirected to a local DB.


Boss

CMS EDG

SECE

CMS software

BOSSDB

WorkloadManagement

System

JDL

RefDB

parameters

data registration

Job output filteringRuntime monitoring

input

dat a

lo

cat i

on

Push data or info

Pull info

UIIMPALA/BOSS

Replica Manager CE

CMS software

CE

WN

SECE

CMS software

SE

SE


Where R-GMA Fits In BOSS designed for use within a local batch farm.

– If a job is scheduled on a remote compute element, status info needs to be sent back the submitter’s site.

– Within a grid environment we want to collate job info from potentially many different farms.

– Job status info should be filtered depending on where the user is located – the use of predicates would therefore be ideal.

R-GMA fits in nicely.– BOSS wrapper makes use of the R-GMA API.– Job status updates are then published using a stream

producer.– An archiver positioned at each UI node can then scoop up

relevant job info and dump it into the locally running BOSS db.

– Users can then access job status info from the UI node.


Use of R-GMA in BOSS

Job

BOSSDB

UIIMPALA/BOSS

WNSandbox

BOSS wrapper

Job

Tee

OutFile

R-GMA API

CE/GK

servlets

Receiver

servlets

Registry


Test Motivation Want to ensure R-GMA can cope with volume of expected

traffic and is scalable. CMS production load estimated at around 5000 jobs. Initial tests* with v3-3-28 only managed about 400 - must

do better. (Note: first tests at Imperial College a year ago fell over at around 10 jobs!)

*Reported at IEEE NSS Conference, Oregon, USA, 21-24 October 2003


Test Design A simulation of the CMS production system was created.

– An MC simulation was designed to represents a typical job.

– Each job creates a stream producer.– Each job publishes a number of tuples depending on the

job phase.– Each job contains 3 phases with varying time delays.

An Archiver collects published tuples.– The Archiver db used is a representation of the BOSS db.– Archived tuples are compared with published tuples to

verify the test outcome.


Topology

Archiver Mon Box

Archiver Client

Test verificationMC Sims

SP Mon Boxes

IC Boss DB

Test Output


Test Setup Archiver & SP mon box setup at Imperial College. SP mon box & IC setup at Brunel. Archiver and MC sim clients positioned at various nodes

within both sites. Tried 1 MC sim and Archiver with variable Job

submissions. Also setup similar test on WP3 test bed using 2 MC sims

and 1 Archiver.


Results 1 MC sim creating 2000 jobs and publishing 7600 tuples

proven to work without glitch. Bi-directional 3000+3000 jobs from Imperial College to

Brunel (and v.v.) worked without problems.

Demonstrated 2 MC sims each running 4000 jobs (with 15200 published tuples) on the WP3 test bed. Peak loading was ~1000 jobs producing data simultaneously.


Pitfalls Encountered Lots of integration problems.

– Limitation on number of open streaming sockets – 1K.– Discovered lots of OutOfMemoryErrors.– Various configurations problems at both imperial College

and Brunel sites.– Usual firewall “challenges”

Probably explained some of the poor initial performance.

Scalability of test is largely dependent on the specs of the Stream Producer/Archiver Mon

boxes.i.e. > 1Gb memory and fast processor


Overall Summary Preparations for a full scale test of CMS production over a

Grid (T0 + T1 + some T2) well underway. Still on target for 1 March start-up.

New Batch Analysis framework “GROSS” being deployed (with BOSS and RGMA) via rpm for DC04

Scalability of RGMA now approaching what is needed for full production load of CMS.

cms report – gridpp collaboration meeting ix peter hobson, brunel university4/2/2004 cms status...

Documents

brunel university422004

peter hobson

cern slide

lcg2 cms data challenge

digitized data

data reduction

pushpull data

local mysql slide