gridpp11 liverpool sept04 samgrid gridpp11 liverpool sept 2004 gavin davies imperial college london

15
GridPP11 Liverpool Sept04 SAMGrid SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London

Upload: gwendoline-martin

Post on 13-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

GridPP11 Liverpool Sept04

SAMGridSAMGrid

GridPP11 LiverpoolSept 2004

Gavin DaviesImperial College London

GridPP11 Liverpool Sept04

IntroductionIntroduction

• Tevatron– Less data than LHC, but still PBs/experiment and growing – Running experiments

• SAM (Sequential Access to Metadata) – Well developed metadata and distributed data replication

system– Developed by DØ & FNAL-CD

• JIM (Job Information and Monitoring)– handles job submission and monitoring (all but data handling)– SAM + JIM →SAMGrid – computational grid

• Runjob – handles job workflow management

See http://cdinternal.fnal.gov/RUNIIRev2004/runIIMP.asp

GridPP11 Liverpool Sept04

SAMGrid SAMGrid ArchitectureArchitecture

GridPP11 Liverpool Sept04

SAM plotsSAM plots

Up to 200TB/month

Over 2 PB in last yr

CDF usage now similar-have just topped the PB

Active SAM sites40 DØ, 26 CDF

(DØ usage)

(DØ usage)

GridPP11 Liverpool Sept04

SAMGrid-plotsSAMGrid-plots

http://samgrid.fnal.gov:8080/(09/09/04)

JIM: Active execution sites: 11DØ, 1 CDF in testing

GridPP11 Liverpool Sept04

SAMGrid plotsSAMGrid plots

GridPP11 Liverpool Sept04

DDØ – Production - MCØ – Production - MC

• All DØ MC always produced off-site

• SAMGrid now default (went into production in mar 04)– Based on request system and jobmanager-mc_runjob– MC software package retrieved via SAM– Currently running at (multiple) sites in Cz, Fr, UK, USA (10 in total

+ FNAL)• more on way, inc central farm

– Average production efficiency ~90%– Average inefficiency due to grid infrastructure ~1-5%

• For more details, see– GridPP10 DØ talk by Peter Love– http://www-d0.fnal.gov/computing/grid/deployment-issues.html

GridPP11 Liverpool Sept04

• P14 Autumn 2003

– 25M events in UK– Based around mc_runjob– Distributed computing rather than Grid– UK effort key to project success

• P17 Autumn 2004– x 10 larger, use of db proxy servers– SAMGrid as default– Use LCG resources

DDØ – Production - Ø – Production - ReprocessingReprocessing

GridPP11 Liverpool Sept04

DDØ – Production - Ø – Production - LCGLCG

• Increasing effort to ensure SAMGrid / LCG interoperability– MC generated on EDG/LCG and other shared resources (inc Imperial, RAL) “by hand”– Demo of sam_client functionality on LCG at London workshop in Apr– Will use LCG resources p17 data reprocessing

All Nikhef MCproduced this way

GridPP11 Liverpool Sept04

(D(DØ –) RunjobØ –) Runjob

• mc_runjob currently used by SAMGrid for MC and reprocessing• DØrunjob - the rewrite• Joint CDF, CMS, DØ, FNAL-CD project

• Base classes from common Runjob package

• DØrunjob available this autumn– Will incorporate Sandbox as a separate module

• For details see: http://projects.fnal.gov/runjob/

Runjob

CDFRunjob CMSRunjob DØRunjob

GridPP11 Liverpool Sept04

CDF – production - ICDF – production - I

• See Mòrag Burgon-Lyon’s GridPP 10 talk for details

• Goal 1: 25% of computing offsite by June 2004– Done, using DCAF and SAM

• DCAF = de-centralised CDF analysis farm, core of 7 sites, more on way

• Goal 2: 50% by June 2005, using Grid– Resources being identified / pledged

• JIM deployment – Originally planned for Oct 15th – Problematic, look at grid3 as possible alternative

GridPP11 Liverpool Sept04

CDF – production - IICDF – production - II

• Migration of DCAF sites to Condor

• Migration to SAM V6– Switch to new internal dbserve code under test– Roll out to global sites expected soon

• FroNTier - new way to serve database contents to remote institutes– Should lower load on central CDF Oracle servers

• Studying methods to lower load and avoid fragmentation on remote file servers due to simultaneous network writes

GridPP11 Liverpool Sept04

(CDF -) SAMTV(CDF -) SAMTV

• SAM TV used by CDF & DØ to monitor SAM and SAM stations– Currently created from log files– Version in dev created from MIS database, filled by new MIS server

GridPP11 Liverpool Sept04

Summary / Summary / plansplans

• SAM & SAMGrid critical – GridPP key part of effort

• SAMGrid, default for– MC production– Data reprocessing from

autumn– Analysis to follow

• dØ tools, dØrte, sandboxing

• Interoperability– Good progress

DØ• 25% of computing off-site

– Most with DCAF/SAM– GridPP effort key part of effort

• Increase to 50% for June 2005– More DCAF installations

• Encourage user migration

UKLight -10Gbit/s - “data –reprocessing”

CDF

GridPP11 Liverpool Sept04

Backup - IBackup - I

From Peter Love’s GridPP10 talk