cms computing and core-software uscms cb riverside, may 19, 2001 david stickland, princeton...
Post on 05-Jan-2016
214 Views
Preview:
TRANSCRIPT
CMS Computing and Core-SoftwareCMS Computing and Core-Software
USCMS CB Riverside, May 19, 2001David Stickland, Princeton University
CMS Computing and Core-Software Deputy PM
DP
S M
ay/1
5/20
01 L
HC
52
Slide 2
CCSCore Computing & Software
PRSPhysics Reconstruction and
Selection
TriDASOnline Software
1. Computing Centres
2. General CMS Computing Services
3. Architecture, Frameworks / Toolkits
9. Tracker / b-tau
8. Online Farms
7. Online Filter Software Framework
6. Production Processing & Data Management
5. Software Process and Quality
4. Software Users and Developers Environment
10. E-gamma / ECAL
11. Jets, Etmiss/HCAL
12. Muons
SPROM (Simulation Project Management)RPROM (Reconstruction Project Management)
GPI (Group for Process Improvement)…recently created
CPROM (Calibration Project Management)…to be createdCafe (CMS Architectural Forum and Evaluation)
CPT ProjectCPT Project
DP
S M
ay/1
5/20
01 L
HC
52
Slide 3
Developing a CCS Project PlanDeveloping a CCS Project Plan
Build a common planning base for all CPT tasks Clarify responsibilities Coordinate milestones March 2001 planning: (http://cmsdoc.cern.ch/cms/cpt/april01-rrb)
Task Breakdown, Deliverables, Cross-projects
Next: Milestone study Top Down
Starting from major deliverables Bottom up
Starting from current project understanding External Constraints
DAQ TDR, Physics TDR, CCS TDR, Data Challenges, LHC timetable etc
Without this it is impossible to measure performance, assign limited resources effectively, identify conflicting constraints etc
DP
S M
ay/1
5/20
01 L
HC
52
Slide 4
Computing and Software: Critical DatesComputing and Software: Critical Dates
Technical Design Reports
End 2002: DAQ TDR 7M events now, +5M Y2001, +10M Y2002
End 2003: CCS TDR. Describe system to be implemented
Mid 2004: Physics TDR: GEANT4, All Luminosities, 20+M Events (?)
A Primary Goal: Prepare the collaboration for LHC analysis, shake-down the tools,
computing systems, software
End 2005: ~ 20% Computing in place ready for Pilot Run Spring 2006
Computing milestones
End 2004: 20% Data challenge. Final test before purchase of production
systems.
Test of offline, post-DAQ, (Level-2 trigger? Calibrations? Alignments…)
20Hz for One month (reconstructed, distributed, analyzed) (40M Events)
DP
S M
ay/1
5/20
01 L
HC
52
Slide 5
Currently reviewing CCS MilestonesCurrently reviewing CCS Milestones
Shown at Nov 2000, LHCC Comprehensive Review
M i l e s t o n e s : S o f t w a r eM i l e s t o n e s : S o f t w a r eM i l e s t o n e s : S o f t w a r e
C M S s o f t w a r e d e v e l o p m e n t s t r a t e g y :F i r s t , t r a n s i t i o n t o C + + , t h e n f u n c t i o n a l i t y , t h e n p e r f o r m a n c e
C M S M I L E S T O N E S
C O R E S O F T W A R EE n d o f F o r t r a n d e v e l o p m e n tG E A N T 4 s i m u l a t i o n o f C M S 1 2 3 4R e c o n s t r u c t i o n / a n a l y s i s f r a m e w o r k 1 2 3 4D e t e c t o r r e c o n s t r u c t i o n 1 2 3 4
P h y s i c s o b j e c t r e c o n s t r u c t i o n 1 2 3 4U s e r a n a l y s i s e n v i r o n m e n t 1 2 3 4
1 P r o o f o f c o n c e p t 3 F u l l y f u n c t i o n a l
2 F u n c t i o n a l p r o t o t y p e 4 P r o d u c t i o n s y s t e m
D e c - 9 8 J u n - 0 0 D e c - 0 2 D e c - 0 4M a r - 9 9 J u n - 0 0 D e c - 0 2 D e c - 0 4
D e c - 9 8 D e c - 9 9 J u n - 0 2 J u n - 0 4J u n - 9 8 D e c - 9 9 D e c - 0 1 D e c - 0 3
2 0 0 5
J u n - 9 8J u n - 9 8 D e c - 9 9 J u n - 0 1 D e c - 0 3
2 0 0 1 2 0 0 2 2 0 0 3 2 0 0 41 9 9 8 1 9 9 9 2 0 0 0
O c t 2 0 0 0O c t 2 0 0 0
I n 2 0 0 5 , n e e d f u l l y f u n c t i o n a l , t e s t e d , h i g h q u a l i t y , p e r f o r m i n g s o f t w a r eP h a s e s : t e s t i d e a s , m a k e p r o t o t y p e s , d e v e l o p m o d u l e s , a n d i n t e g r a t e
I n 2 0 0 0 , w e h a v e f u n c t i o n a l p r o t o t y p e s ( s e e t a l k b y D . S t i c k l a n d )N e x t , c r e a t e t h e b a s i s f o r f i n a l s y s t e m
Milestone waves
Not easily
reviewable
Need more detail
Not tied to
deliverables
The work required
to satisfy the
milestone is
typically not
described by the
milestone so may
not be properly
monitored or
tracked
DP
S M
ay/1
5/20
01 L
HC
52
Slide 6
Activity NameStart Date
Finish Date
Duration6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 9101112
2000 2001 2002 2003 2004 2005 2006
6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 91011121 2 3 4 5 6 7 8 9101112
DAQ TDRBar Legend
External ConstraintL1 Milestone (LHCC)L2 Milestone (CMS-SC)L3 Milestone (CCS)External PeriodDesign PeriodPreparation PeriodImplementation PeriodTest/ PeriodAsses and Report Period
High Lumi productions 8/27/00 8/26/01 11.96
PRS Analysis 10/15/00 10/28/01 12.42
B Physics productions 8/26/01 10/7/01 1.38
PRS Analysis (B's) 9/2/01 6/16/02 9.43
Finalizing and review 6/16/02 12/1/02 5.52
DAQTDR Submitted 12/1/02
CCS TDR
Definiton of purpose 8/26/01 1/13/02 4.60
Review of purpose 1/13/02 6/2/02 4.60
Structure defined 6/2/02
Implementataion 6/9/02 6/29/03 12.65
First Draft Complete 6/29/03
Review and Finalize 6/29/03 11/30/03 5.06
CCS TDR Submitted 12/28/03
Physics TDR
PTDR Submitted 1/2/05
20% Data Challenge
Define Challenge 12/29/02 4/6/03 3.22
Challenge Defined 4/6/03
Prepare for challenge 4/6/03 6/20/04 14.49
MC Samples prepared 6/20/04 9/19/04 2.99
Challenge Operation 11/14/04 12/12/04 0.92
Computing Completed 1/2/05
Assessment 1/2/05 4/3/05 2.99
Data Challenge Complete 4/3/05
LHC Pilot run Starts 4/30/06
Pilot run 4/30/06 6/4/06 1.15
LHC Physics Run I Starts 7/30/06
First LHC Run 7/30/06 3/4/07 7.13
Old Milestones( LHC53)... 7/30/001/4/041/4/041/2/051/2/051/2/051/1/061/1/066/30/027/4/046/30/02
12/29/0212/29/02
7/4/0410/3/0410/3/04
12/29/027/3/056/30/02
12/29/027/4/041/1/06
TDR’s and Challenges (Preliminary)TDR’s and Challenges (Preliminary)
DP
S M
ay/1
5/20
01 L
HC
52
Slide 7
Current Computing ActivityCurrent Computing Activity Spring 2001:
CERN: 200-300 CPU’s, new Objectivity version, new Tape/MSS system, new Data-servers
Currently (best) 70MB/s out of Objectivity
Testing to determine where next bottleneck is: Disk access, Network, Federation locks…
1TB output data in 3 days – to be used by ECAL-e/ PRS group
Currently running Calo+Tracker digitization at 10**34
Will write about 6TB 200 CPU nodes in single federation Integrated with CASTOR
Though not as transparently as we plan for next round Testing ATA/3Ware EIDE Disk systems for data servers (input and output)
Sustained productions achieved
FNAL has responsibility for the JetMET datasets, INFN for the Muon. Continuing to ramp productions, consolidate tools, more automation etc..
DP
S M
ay/1
5/20
01 L
HC
52
Slide 8
Common Prototypes: CMS Computing, Common Prototypes: CMS Computing, 2002-20042002-2004
Double the complexity (number of boxes) each year to reach 50% of final complexity of a single expt. in 2004, before production system purchasing
Match Computing Challenges with CMS Physics and Detector Milestones
Prototype Computing Power
0
20
40
60
80
100
120
2000 2001 2002 2003 2004
Year
kS
I95
CERN T0/T1(shared)
Regional T1's
Regional T2's
CERN prototype is a time-shared facility available for ~30% of the time at full power for CMS
DAQ TDRCCS TDR
Physics TDR
20% Data Challenge
Some (~50%) of current T2 prototypes primarily for GRID related R&D.
Prototype and final size/cost document: http://cmsdoc.cern.ch/cms/cpt/april01-rrb
DP
S M
ay/1
5/20
01 L
HC
52
Slide 9
Long Term Plan: Computing Ramp-upLong Term Plan: Computing Ramp-up
Ramp Production systems 05-07 (30%,+30%,+40% of cost each year) Match Computing power available with LHC luminosity
CPU Computing Power
0
100
200
300
400
500
600
700
800
900
1000
2000 2001 2002 2003 2004 2005 2006 2007
Year
kS
I95
CERN T0/T1(shared)
Regional T1's
Regional T2's
2006200M Reco ev/mo
100M Re-Reco ev/mo30k ev/s Analysis
2007300M Reco ev/mo
200M Re-Reco ev/mo50k ev/s Analysis
DP
S M
ay/1
5/20
01 L
HC
52
Slide 10
Current most significant risk to the project is Current most significant risk to the project is insufficient SW manpowerinsufficient SW manpower
We are making good use of the resources we have and making progress: OO code is deployed and is the standard for CMS Worldwide productions Full use of prototype facilities
Leading to improved code and understanding of limitations A solid SW Infrastructure base is in place
But there are many things we are unable to cover adequately: No Calibration infrastructure No Alignment infrastructure Detector Description Database only just getting underway Analysis infrastructure not yet deployed Slow progress with our GEANT4 implementation Unable (time!) to answer all the (good) questions the GRID projects are asking
us “Spotty” user-support
Best effort, when time permits Most of the tasks in SW Quality Assurance and Control are unmanned Unacceptably high exposure to loss of key people
No backups in any role Etc etc….
DP
S M
ay/1
5/20
01 L
HC
52
Slide 11
Next StepsNext Steps
We continue to build a project plan for CCS
We continue to put in place an IMoU for the SW Manpower
In the meantime we focus action to actually get the manpower
We clearly define our prototype requirements
Those Prototypes may be supplied within an IMoU context, or within a broader
context of collaboration towards LHC Computing
We try to work with CERN to ensure the experiments and the Regional
centers are the driving partners in any new projects and that our real
needs are addressed
top related