cms report – gridpp collaboration meeting viii peter hobson, brunel university22/9/2003 cms...

11
CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003 CMS Applications Progress towards GridPP milestones Data management (Bristol) Monitoring (Brunel + Imperial) Bristol, Brunel and Imperial (1.5 GRIDPP FTE in total)

Upload: tyrone-martin

Post on 26-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

CMS Applications

Progress towards GridPP milestones Data management (Bristol) Monitoring (Brunel + Imperial)

Bristol, Brunel and Imperial (1.5 GRIDPP FTE in total)

Page 2: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

DC04 Pre-Challenge Production Data Challenge: March 2004 nominal (see Hugh Tallini’s talk)

An end-to-end test of the CMS offline computing system ‘Play back’ digi data, emulating CMS DAQ -> storage, reconstruction,

calibration, data reduction and analysis at T0 & external T1’s Pre-challenge production

>70M fully simulated, hit-formatted, digitised events required for DC04 Using both Geant3 and Geant4 simulation; based on POOL persistency

UK status In production for ~3 months at RAL T1 (Bristol-managed) & Imperial UK has contributed ~25% of production so far RAL is also a major data store and hosts the central catalogue for current

data management solution (SRB) [v. high-profile contribution] Next steps:

Digitisation of simulated data – much more demanding of farms Production will continue for the rest of 2003 (though not at all sites) Large-scale replication of digis to CERN (Castor) via WAN

Page 3: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

Production Stats

We are late! => Need to maximise use of RAL farm until the last minute (November) Hand over resources to Atlas as they are required (i.e. keep the queues full, ramp down

CMS production as Atlas ramps up through queue policies). Migrate to LCG farm…?

Page 4: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

Data management (wide-area) Short-term solution (PCP03)

Using SRB for all data management, across ~20 sites After considerable effort by RAL e-science staff and CMS people: it works

very nicely (deployed across all sites in ~10 days). RAL doing a highly professional job of hosting central MCAT

The medium term (DC04) Move towards LCG ASAP; Introduce middleware components into the

running production, as they are released and tested (LCG timescale?) Potential problem with data management MSS interface timescales (ask

technical gurus for details); currently discussing our approach One possibility: Integrate SRB (incl. MSS interface) below LCG RLS

• Of potential interest to BaBar, Belle, US Grid projects – will discuss at SLAC• Abstract submitted to ACAT ’03

Alternative: Each T1 implements its own MSS interface• At RAL, will probably be SRB-ADS anyway, since this is tested and working

The longer term (analysis of data for Physics TDR): LCG Will need a transparent migration of current catalogues, etc

Page 5: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

Data Management (local) Digitisation setup

Data (hits) serving for realistic full-pileup (25 overlapping events) digitisation is very demanding

Current RAID disk servers + LAN don’t scale to 100’s of CPUs• Performance scales roughly as number of spindles, so bigger disks don’t gain

us much. Solution: use distributed disk resources, localised in ‘sub-farms’ Use Dcache as the local data management solution

• FNAL and RAL are the testbeds for this approach

POOL POOL release 1.3 now integrated within the CMS COBRA framework Functional / performance testing & development of catalogue handling

approach under way within CMS (incl. Bristol) Full integration of POOL catalogue with local + wide area data

management is the next step (work within LCG + CMS) Also re-examining data clustering strategy for wide-area optimisation

Page 6: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

Stress testing BOSS with RGMA

::RGMA

Consumer

Producer

Consumer

Producer

::BOSStoRGMA

connectionManager

BossRealTimeRGMAUpdater

connectionManager

BossRealTimeRGMAUpdater

::BOSS

dbUpdatordbUpdator

1..*

1..*

1

1

1

1Brunel responsibility

EDG WP3 responsibility

CMS responsibility

The CMS job submission and monitoring system BOSS is now GRID enabled using the R-GMA middleware from EDG WP3.

Page 7: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

Stress testing BOSS with RGMA

Static information is relayed via an information service • architecture, operating system, CPU details, disk capacity, access policy and application version. • query/ response semantics only

 Dynamic information

is relayed via a monitoring service• CPU load, fraction of disk used, network speed and application trace data. • both query/ response semantics and publish/ subscribe semantics.

Page 8: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

Stress testing BOSS with RGMA

Use Case Name: Production •The production coordinator submits 10,000 production jobs using BOSS (http://www.bo.infn.it/cms/computing/BOSS/) from a single Grid node.

•Each job takes of the order of 10 hours to run on a CPU with speed of the order of 1GHz and produces output files of the order of 500Mb.

•The jobs are likely to be distributed to around 10 sites. Each job may contain up to 20 messages inserted by the physicist for the purposes of

alert, or, more rarely, alarm.

Page 9: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

Stress testing BOSS with RGMA

BOSSDB

UIIMPALA/BOSS

WNSandbox

BOSS wrapper

Job

Tee

OutFile

R-GMA API

Farm

servlets

Receiver

servlets

Registry

Receiver

1

23

4

5a5b

6Tested on CMS-LCG0 testbed at IC and Brunel

Page 10: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

Stress testing BOSS with RGMA

Plausible sensor data volume for a single BOSS job

Plan:•Submit 50 real production jobs to a local batch system, and deduce an approximation to the distribution of intervals between sensor messages and the size of those messages. •The sensor data produced will be fed directly into R-GMA to investigate

scaling and failure modes. Results will be presented at the IEEE NSS conference in Oregon in October 2003

Page 11: CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management

CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University 22/9/2003

Summary Sucesses

1/4 of all pre-production data produced in the UK. SRB for pre-production challenge data management has worked

well. POOL release 1.3 now integrated within the CMS COBRA

framework.

Problems Late start to the pre-production challenge. Some concerns over the stability and scalability of RGMA.