cms computing and core-software uscms cb riverside, may 19, 2001 david stickland, princeton...

11
CMS Computing and Core-Software CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

Upload: miles-dennis

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

CMS Computing and Core-SoftwareCMS Computing and Core-Software

USCMS CB Riverside, May 19, 2001David Stickland, Princeton University

CMS Computing and Core-Software Deputy PM

Page 2: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

DP

S M

ay/1

5/20

01 L

HC

52

Slide 2

CCSCore Computing & Software

PRSPhysics Reconstruction and

Selection

TriDASOnline Software

1. Computing Centres

2. General CMS Computing Services

3. Architecture, Frameworks / Toolkits

9. Tracker / b-tau

8. Online Farms

7. Online Filter Software Framework

6. Production Processing & Data Management

5. Software Process and Quality

4. Software Users and Developers Environment

10. E-gamma / ECAL

11. Jets, Etmiss/HCAL

12. Muons

SPROM (Simulation Project Management)RPROM (Reconstruction Project Management)

GPI (Group for Process Improvement)…recently created

CPROM (Calibration Project Management)…to be createdCafe (CMS Architectural Forum and Evaluation)

CPT ProjectCPT Project

Page 3: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

DP

S M

ay/1

5/20

01 L

HC

52

Slide 3

Developing a CCS Project PlanDeveloping a CCS Project Plan

Build a common planning base for all CPT tasks Clarify responsibilities Coordinate milestones March 2001 planning: (http://cmsdoc.cern.ch/cms/cpt/april01-rrb)

Task Breakdown, Deliverables, Cross-projects

Next: Milestone study Top Down

Starting from major deliverables Bottom up

Starting from current project understanding External Constraints

DAQ TDR, Physics TDR, CCS TDR, Data Challenges, LHC timetable etc

Without this it is impossible to measure performance, assign limited resources effectively, identify conflicting constraints etc

Page 4: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

DP

S M

ay/1

5/20

01 L

HC

52

Slide 4

Computing and Software: Critical DatesComputing and Software: Critical Dates

Technical Design Reports

End 2002: DAQ TDR 7M events now, +5M Y2001, +10M Y2002

End 2003: CCS TDR. Describe system to be implemented

Mid 2004: Physics TDR: GEANT4, All Luminosities, 20+M Events (?)

A Primary Goal: Prepare the collaboration for LHC analysis, shake-down the tools,

computing systems, software

End 2005: ~ 20% Computing in place ready for Pilot Run Spring 2006

Computing milestones

End 2004: 20% Data challenge. Final test before purchase of production

systems.

Test of offline, post-DAQ, (Level-2 trigger? Calibrations? Alignments…)

20Hz for One month (reconstructed, distributed, analyzed) (40M Events)

Page 5: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

DP

S M

ay/1

5/20

01 L

HC

52

Slide 5

Currently reviewing CCS MilestonesCurrently reviewing CCS Milestones

Shown at Nov 2000, LHCC Comprehensive Review

M i l e s t o n e s : S o f t w a r eM i l e s t o n e s : S o f t w a r eM i l e s t o n e s : S o f t w a r e

C M S s o f t w a r e d e v e l o p m e n t s t r a t e g y :F i r s t , t r a n s i t i o n t o C + + , t h e n f u n c t i o n a l i t y , t h e n p e r f o r m a n c e

C M S M I L E S T O N E S

C O R E S O F T W A R EE n d o f F o r t r a n d e v e l o p m e n tG E A N T 4 s i m u l a t i o n o f C M S 1 2 3 4R e c o n s t r u c t i o n / a n a l y s i s f r a m e w o r k 1 2 3 4D e t e c t o r r e c o n s t r u c t i o n 1 2 3 4

P h y s i c s o b j e c t r e c o n s t r u c t i o n 1 2 3 4U s e r a n a l y s i s e n v i r o n m e n t 1 2 3 4

1 P r o o f o f c o n c e p t 3 F u l l y f u n c t i o n a l

2 F u n c t i o n a l p r o t o t y p e 4 P r o d u c t i o n s y s t e m

D e c - 9 8 J u n - 0 0 D e c - 0 2 D e c - 0 4M a r - 9 9 J u n - 0 0 D e c - 0 2 D e c - 0 4

D e c - 9 8 D e c - 9 9 J u n - 0 2 J u n - 0 4J u n - 9 8 D e c - 9 9 D e c - 0 1 D e c - 0 3

2 0 0 5

J u n - 9 8J u n - 9 8 D e c - 9 9 J u n - 0 1 D e c - 0 3

2 0 0 1 2 0 0 2 2 0 0 3 2 0 0 41 9 9 8 1 9 9 9 2 0 0 0

O c t 2 0 0 0O c t 2 0 0 0

I n 2 0 0 5 , n e e d f u l l y f u n c t i o n a l , t e s t e d , h i g h q u a l i t y , p e r f o r m i n g s o f t w a r eP h a s e s : t e s t i d e a s , m a k e p r o t o t y p e s , d e v e l o p m o d u l e s , a n d i n t e g r a t e

I n 2 0 0 0 , w e h a v e f u n c t i o n a l p r o t o t y p e s ( s e e t a l k b y D . S t i c k l a n d )N e x t , c r e a t e t h e b a s i s f o r f i n a l s y s t e m

Milestone waves

Not easily

reviewable

Need more detail

Not tied to

deliverables

The work required

to satisfy the

milestone is

typically not

described by the

milestone so may

not be properly

monitored or

tracked

Page 6: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

DP

S M

ay/1

5/20

01 L

HC

52

Slide 6

Activity NameStart Date

Finish Date

Duration6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 9101112

2000 2001 2002 2003 2004 2005 2006

6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 9101112

DAQ TDRBar Legend

External ConstraintL1 Milestone (LHCC)L2 Milestone (CMS-SC)L3 Milestone (CCS)External PeriodDesign PeriodPreparation PeriodImplementation PeriodTest/ PeriodAsses and Report Period

High Lumi productions 8/27/00 8/26/01 11.96

PRS Analysis 10/15/00 10/28/01 12.42

B Physics productions 8/26/01 10/7/01 1.38

PRS Analysis (B's) 9/2/01 6/16/02 9.43

Finalizing and review 6/16/02 12/1/02 5.52

DAQTDR Submitted 12/1/02

CCS TDR

Definiton of purpose 8/26/01 1/13/02 4.60

Review of purpose 1/13/02 6/2/02 4.60

Structure defined 6/2/02

Implementataion 6/9/02 6/29/03 12.65

First Draft Complete 6/29/03

Review and Finalize 6/29/03 11/30/03 5.06

CCS TDR Submitted 12/28/03

Physics TDR

PTDR Submitted 1/2/05

20% Data Challenge

Define Challenge 12/29/02 4/6/03 3.22

Challenge Defined 4/6/03

Prepare for challenge 4/6/03 6/20/04 14.49

MC Samples prepared 6/20/04 9/19/04 2.99

Challenge Operation 11/14/04 12/12/04 0.92

Computing Completed 1/2/05

Assessment 1/2/05 4/3/05 2.99

Data Challenge Complete 4/3/05

LHC Pilot run Starts 4/30/06

Pilot run 4/30/06 6/4/06 1.15

LHC Physics Run I Starts 7/30/06

First LHC Run 7/30/06 3/4/07 7.13

Old Milestones( LHC53)... 7/30/001/4/041/4/041/2/051/2/051/2/051/1/061/1/066/30/027/4/046/30/02

12/29/0212/29/02

7/4/0410/3/0410/3/04

12/29/027/3/056/30/02

12/29/027/4/041/1/06

TDR’s and Challenges (Preliminary)TDR’s and Challenges (Preliminary)

Page 7: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

DP

S M

ay/1

5/20

01 L

HC

52

Slide 7

Current Computing ActivityCurrent Computing Activity Spring 2001:

CERN: 200-300 CPU’s, new Objectivity version, new Tape/MSS system, new Data-servers

Currently (best) 70MB/s out of Objectivity

Testing to determine where next bottleneck is: Disk access, Network, Federation locks…

1TB output data in 3 days – to be used by ECAL-e/ PRS group

Currently running Calo+Tracker digitization at 10**34

Will write about 6TB 200 CPU nodes in single federation Integrated with CASTOR

Though not as transparently as we plan for next round Testing ATA/3Ware EIDE Disk systems for data servers (input and output)

Sustained productions achieved

FNAL has responsibility for the JetMET datasets, INFN for the Muon. Continuing to ramp productions, consolidate tools, more automation etc..

Page 8: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

DP

S M

ay/1

5/20

01 L

HC

52

Slide 8

Common Prototypes: CMS Computing, Common Prototypes: CMS Computing, 2002-20042002-2004

Double the complexity (number of boxes) each year to reach 50% of final complexity of a single expt. in 2004, before production system purchasing

Match Computing Challenges with CMS Physics and Detector Milestones

Prototype Computing Power

0

20

40

60

80

100

120

2000 2001 2002 2003 2004

Year

kS

I95

CERN T0/T1(shared)

Regional T1's

Regional T2's

CERN prototype is a time-shared facility available for ~30% of the time at full power for CMS

DAQ TDRCCS TDR

Physics TDR

20% Data Challenge

Some (~50%) of current T2 prototypes primarily for GRID related R&D.

Prototype and final size/cost document: http://cmsdoc.cern.ch/cms/cpt/april01-rrb

Page 9: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

DP

S M

ay/1

5/20

01 L

HC

52

Slide 9

Long Term Plan: Computing Ramp-upLong Term Plan: Computing Ramp-up

Ramp Production systems 05-07 (30%,+30%,+40% of cost each year) Match Computing power available with LHC luminosity

CPU Computing Power

0

100

200

300

400

500

600

700

800

900

1000

2000 2001 2002 2003 2004 2005 2006 2007

Year

kS

I95

CERN T0/T1(shared)

Regional T1's

Regional T2's

2006200M Reco ev/mo

100M Re-Reco ev/mo30k ev/s Analysis

2007300M Reco ev/mo

200M Re-Reco ev/mo50k ev/s Analysis

Page 10: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

DP

S M

ay/1

5/20

01 L

HC

52

Slide 10

Current most significant risk to the project is Current most significant risk to the project is insufficient SW manpowerinsufficient SW manpower

We are making good use of the resources we have and making progress: OO code is deployed and is the standard for CMS Worldwide productions Full use of prototype facilities

Leading to improved code and understanding of limitations A solid SW Infrastructure base is in place

But there are many things we are unable to cover adequately: No Calibration infrastructure No Alignment infrastructure Detector Description Database only just getting underway Analysis infrastructure not yet deployed Slow progress with our GEANT4 implementation Unable (time!) to answer all the (good) questions the GRID projects are asking

us “Spotty” user-support

Best effort, when time permits Most of the tasks in SW Quality Assurance and Control are unmanned Unacceptably high exposure to loss of key people

No backups in any role Etc etc….

Page 11: CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM

DP

S M

ay/1

5/20

01 L

HC

52

Slide 11

Next StepsNext Steps

We continue to build a project plan for CCS

We continue to put in place an IMoU for the SW Manpower

In the meantime we focus action to actually get the manpower

We clearly define our prototype requirements

Those Prototypes may be supplied within an IMoU context, or within a broader

context of collaboration towards LHC Computing

We try to work with CERN to ensure the experiments and the Regional

centers are the driving partners in any new projects and that our real

needs are addressed