L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Layout of the talkLayout of the talkLayout of the talkLayout of the talk
Most material from Irwin Gaines talk at Chep2000 The basic goals and structure of the project The Regional Centers
MotivationCharacteristicsFunctions
Same Results from the simulations The need for more realistic “implementation oriented”
Models: Phase-3 Relations with GRID
Status of the project: Phase-3 LOI presented in January, Phase-2 Final Report to be published next week, Milestones and basic goals met
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
MONARCMONARCMONARCMONARC A joint project (LHC experiments and CERN/IT) to understand issues associated with distributed data access and analysis for the LHC
Examine distributed data plans of current and near future experiments
Determine characteristics and requirements for LHC regional centers
Understand details of analysis process and data access needs for LHC data
Measure critical parameters characterizing distributed architectures, especially database and network issues
Create modeling and simulation tools Simulate a variety of models to understand constraints on architectures
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
MONARC MONARC MONARC MONARC
MModels odels OOf f NNetworked etworked AAnalysis nalysis At At RRegional egional CCentersenters
Caltech, CERN, FNAL, Heidelberg, INFN, Caltech, CERN, FNAL, Heidelberg, INFN, Helsinki, KEK, Lyon, Marseilles, Munich, Helsinki, KEK, Lyon, Marseilles, Munich,
Orsay, Oxford, RAL,Tufts, ...Orsay, Oxford, RAL,Tufts, ...GOALSGOALS
Specify the main parameters characterizing Specify the main parameters characterizing the Model’s performance: throughputs, the Model’s performance: throughputs, latencieslatencies
Determine classes of Computing Models Determine classes of Computing Models feasible for LHC (matched to network feasible for LHC (matched to network capacity and data handling resources)capacity and data handling resources)
Develop “Baseline Models” in the “feasible” Develop “Baseline Models” in the “feasible” categorycategory
Verify resource requirement baselines: Verify resource requirement baselines: (computing, data handling, networks)(computing, data handling, networks)
COROLLARIES:COROLLARIES: Define the Define the Analysis ProcessAnalysis Process Define Define Regional Center ArchitecturesRegional Center Architectures Provide Provide Guidelines for the final ModelsGuidelines for the final Models
622
Mbi
ts/s 622 M
bits/s
Desktops
CERNn.107 MIPS
m Pbyte Robot
Universityn.106MIPS
m Tbyte Robot
FNAL4.107 MIPS110 Tbyte
Robot
622
Mbi
ts/s
N x
622
M
bit
s/s
622Mbits/s
622 Mbits/s
Desktops
Desktops
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Working GroupsWorking GroupsWorking GroupsWorking Groups
Architecture WG Baseline architecture for regional centres, Technology tracking, Survey of
computing model of current HENP experiments
Analysis Model WG Evaluation of LHC data analysis model and use cases
Simulation WG Develop a simulation tool set for performance evaluation of the computing
models
Testbed WG Evaluate the performance of ODBMS, network in the distributed environment
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
General Need for distributed data General Need for distributed data access and analysis:access and analysis:
General Need for distributed data General Need for distributed data access and analysis:access and analysis:
Potential problems of a single centralized computing center include:
- scale of LHC experiments: difficulty of accumulating and managing all resources at one location
- geographic spread of LHC experiments: providing equivalent location independent access to data for physicists
- help desk, support and consulting in same time zone
- cost of LHC experiments: optimizing use of resources located world wide
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Motivations for Regional CentersMotivations for Regional Centers Motivations for Regional CentersMotivations for Regional Centers
A distributed computing architecture based on regional centers offers:
A way of utilizing the expertise and resources residing in computing centers all over the world
Provide local consulting and support To maximize the intellectual contribution of physicists all
over the world without requiring their physical presence at CERN
Acknowledgement of possible limitations of network bandwidth
Allows people to make choices on how they analyze data based on availability or proximity of various resources such as CPU, data, or network bandwidth.
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Future Experiment SurveyFuture Experiment SurveyFuture Experiment SurveyFuture Experiment Survey
Analysis/Results From the previous survey, we saw many sites contributed to
Monte Carlo generation This is now the norm
New experiments trying to use the Regional Center concept BaBar has Regional Centers at IN2P3 and RAL, a smaller one in Rome STAR has Regional Center at LBL/NERSC CDF and D0 offsite institutions paying more attention as run gets
closer.
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Future Experiment SurveyFuture Experiment SurveyFuture Experiment SurveyFuture Experiment Survey
Other observations/ requirements In the last survey, we pointed out the following requirements for RC’s:
24X7 support software development team diverse body of users good, clear documentation of all s/w and s/w tools
The following are requirements for the central site (I.e. CERN) Central code repository easy to use and easily accessible for remote sites be “sensitive” to remote sites in database handling, raw data handling and
machine flavors provide good, clear documentation of all s/w and s/w tools
The experiments in this survey achieving the most in distributed computing are following these guidelines
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Tier0: CERN
Tier1: National “Regional” Center
Tier2: Regional Center
Tier3: Institute Workgroup Server
Tier4: Individual Desktop
Total 5 Levels
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
250 Gbps
0.8 Gbps
8 Gbps
…………1400 boxes160 clusters40 sub-farms
12 Gbps*
480 Gbps*
3 Gbps*
1.5 Gbps
100 drives
12 Gbps
5400 disks
340 arrays
……...
LAN-SAN routers
LAN-WAN routers
CERN
CMS Offline Farmat CERN circa 2006
lmr for Monarc study- april 1999
tapes
0.8 Gbps (daq)
0.8 Gbps
5 Gbps
disks
processors
storage network
storage network
farm network
* assumes all disk & tape traffic on storage network double these numbers if all disk & tape traffic through LAN-SAN router
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
year 2004 2005 2006 2007
total cpu (SI95) 70'000 350'000 520'000 700'000disks (TB) 40 340 540 740
LAN thr-put (GB/sec) 6 31 46 61
tapes (PB) 0.2 1 3 5
tape I/O (GB/sec) 0.2 0.3 0.5 0.5
Farm capacity and evolution
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Processor cluster
basic boxfour 100 SI95 processorsstandard network connection (~2 Gbps)15% of systems configured as I/O servers (disk server, disk-tape mover, Objy AMS, ..) with additional connection to the storage networkcluster9 basic boxes with a network switch (<10 Gbps)sub-farm4 clusters - with a second-level network switch (<50 Gbps)one sub-farm fits in one rack
3 Gbps*
1.5 Gbps
configured asI/O servers
storage network
farm network
cluster and sub-farm sizing adjusted to fit convenientlythe capabilities of network switch, racking, power distributioncomponents
sub-farm: 36 boxes, 144 cpus, 5 m2
lmr for Monarc study- april 1999
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Regional CentersRegional CentersRegional CentersRegional Centers
Regional Centers will Provide all technical services and data services required to do the
analysis Maintain all (or a large fraction of) the processed analysis data.
Possibly may only have large subsets based on physics channels. Maintain a fixed fraction of fully reconstructed and raw data
Cache or mirror the calibration constants Maintain excellent network connectivity to CERN and excellent
connectivity to users in the region. Data transfer over the network is preferred for all transactions but transfer of very large datasets on removable data volumes is not ruled out.
Share/develop common maintenance, validation, and production software with CERN and the collaboration
Provide services to physicists in the region, contribute a fair share to post-reconstruction processing and data analysis, collaborate with other RCs and CERN on common projects, and provide services to members of other regions on a best effort basis to further the science of the experiment
Provide support services, training, documentation, trouble shooting to RC and remote users in the region
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
DataImport
DataExport
Mass Storage & DiskServers
Database Servers
Tapes
Network from CERN
Networkfrom Tier 2 andsimulation centers
PhysicsSoftware
Development
R&D Systemsand Testbeds
Info serversCode servers
Web ServersTelepresence
Servers
TrainingConsultingHelp Desk
ProductionReconstruction
Raw/Sim-->ESD
Scheduled, predictable
experiment/physics groups
ProductionAnalysis
ESD-->AODAOD-->DPD
Scheduled
Physics groups
Individual Analysis
AOD-->DPDand plots
Chaotic
Physicists Desktops
Tier 2
Local institutes
CERN
Tapes
Support Services
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
DataImport
DataExport
Mass Storage & DiskServers
Database Servers
Tapes
Network from CERN
Networkfrom Tier 2 andsimulation centers
PhysicsSoftware
Development
R&D Systemsand Testbeds
Info serversCode servers
Web ServersTelepresence
Servers
TrainingConsultingHelp Desk
ProductionReconstruction
Raw/Sim-->ESD
Scheduled, predictable
experiment/physics groups
ProductionAnalysis
ESD-->AODAOD-->DPD
Scheduled
Physics groups
Individual Analysis
AOD-->DPDand plots
Chaotic
Physicists Desktops
Tier 2
Local institutes
CERN
TapesData Input Rate from CERN: Raw Data - 5% 50TB/yr ESD Data - 50% 50TB/yr AOD Data - All 10TB/yr Revised ESD - 20TB/yr
Data Input from Tier 2: Revised ESD and AOD - 10TB/yr
Data Input from Simulation Centers: Raw Data - 100TB/yr
Data Output Rate to CERN: AOD Data - 8 TB/yr Recalculated ESD - 10 TB/yr Simulation ESD data - 10 TB/yr
Data Output to Tier 2: Revised ESD and AOD - 15 TB/yr
Data Output to local institutes: ESD, AOD, DPD data - 20TB/yr
Total Storage: Robotic Mass Storage - 300TB Raw Data: 50TB 5*10**7 events (5% of 1 year) Raw (Simulated) Data: 100TB 10**8 events EDS (Reconstructed Data): 100TB - 10**9 events (50% of 2 years) AOD (Physics Object) Data: 20TB 2*10**9 events (100% of 2 years) Tag Data: 2TB (all) Calibration/Conditions data base: 10TB (only latest version of most data types kept here)Central Disk Cache - 100TB (per user demand)
CPU Required for AMS database servers: ??*10**3 SI95 power
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
DataImport
DataExport
Mass Storage & DiskServers
Database Servers
Tapes
Network from CERN
Network from Tier 2 andsimulation centers
ProductionReconstruction
Raw/Sim-->ESD
Scheduled
experiment/physics groups
ProductionAnalysis
ESD-->AODAOD-->DPD
Scheduled
Physics groups
Individual Analysis
AOD-->DPDand plots
Chaotic
Physicists
Desktops
Tier 2
Local institutes
CERN
Tapes
Farms of low cost commodity computers, limited I/O rate, modest local disk cache-----------------------------------------------------Reconstruction Jobs: Reprocessing of raw data: 10**8 events/year (10%) Initial processing of simulated data: 10**8/year
1000 SI95-sec/event ==> 10**4 SI95 capacity: 100 processing nodes of 100 SI95 power
Event Selection Jobs: 10 physics groups * 10**8 events (10%samples) * 3 times/yr based on ESD and latest AOD data 50 SI95/evt ==> 5000 SI95 power
Physics Object creation Jobs: 10 physics groups * 10**7 events (1% samples) * 8 times/yr based on selected event sample ESD data 200 SI95/event ==> 5000 SI95 power
Derived Physics data creation Jobs: 10 physics groups * 10**7 events * 20 times/yr based on selected AOD samples, generates “canonical” derived physics data 50 SI95/evt ==> 3000 SI95 power
Total 110 nodes of 100 SI95 power
Derived Physics data creation Jobs: 200 physicists * 10**7 events * 20 times/yr based on selected AOD and DPD samples 20 SI95/evt ==> 30,000 SI95 power
Total 300 nodes of 100 SI95 power
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
MONARC Analysis Process ExampleMONARC Analysis Process ExampleMONARC Analysis Process ExampleMONARC Analysis Process Example
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Model and Simulation parametersModel and Simulation parametersModel and Simulation parametersModel and Simulation parameters
Have a new set of parameters common to all simulating groups.
More realistic values, but still to be discussed/agreed on the basis of Experiment’s information.
1000 Proc_time_RAW SI95sec/event (350)25 Proc_Time_ESD “ (2.5)5 Proc_Time_AOD “ (0.5)3 Analyze_Time_TAG “3 Analyze_Time_AOD “15 Analyze_Time_ESD “ (3)600 Analyze_Time_RAW “ (350)100 Memory of Jobs MB5000 Proc_Time_Create_RAW SI95sec/event (35)1000 Proc_Time_Create_ESD “ (1)25 Proc_Time_Create_AOD “ (1)
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Base Model usedBase Model usedBase Model usedBase Model used
Basic Jobs Reconstruction of 107 events : RAW--> ESD --> AOD --> TAG at CERN
It’s the production while the data are coming from the DAQ (100 days of running collecting a billion of events per year)
Analysis of 5 Working Groups each of 25 analyzers on TAG only (no request to higher level data samples). Every analyzer submit 4 sequential jobs on 106 events.Each analyzer work start-time is a flat random choice in the range of 3000 seconds.Each analyzer data sample of 106 events is a random choice in the complete data sample of TAG DataBase consisting of 107 events.
Transfer (FTP) of a 107 events ESD, AOD and TAG from CERN to RC
–CERN Activities : Reconstruction, 5 WG Analysis, FTP transferCERN Activities : Reconstruction, 5 WG Analysis, FTP transfer–RC Activities : 5 (uncorrelated) WG Analysis, receive FTP RC Activities : 5 (uncorrelated) WG Analysis, receive FTP transfertransfer
Job’s “paper estimate”: –Single Analysis Job : 1.67 CPU hours at CERN = 6000 sec at CERN (same at RC)–Reconstruction at CERN for 1/500 RAW to ESD : 3.89 CPU hours = 14000 sec–Reconstruction at CERN for 1/500 ESD to AOD : 0.03 CPU hours = 100 sec
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Resources: LAN speeds ?!Resources: LAN speeds ?!Resources: LAN speeds ?!Resources: LAN speeds ?!
In our Models the DB Servers are uncorrelated and thus one activity uses a single Server. The bottlenecks are the “read” and “write” speed to and from the Server. In order to use the CPU power at reasonable percentage we need a read speed of at least 300 MB/s and a write speed of 100 MB/s (milestone already met today)
We use 100 MB/s in current simulations (10 Gbits/sec switched LANs in 2005 may be possible).
Processing node link speed is negligible in our simulations.
Of course the “real” implementation of the Farms can be different, but the results of the simulation do not depend on “real” implementation: they are based on usable resources.
See following slides
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
More realistic values for CERN and RCMore realistic values for CERN and RCMore realistic values for CERN and RCMore realistic values for CERN and RC
Data Link speeds at 100 MB/sec100 MB/sec (all values) except : Node_Link_Speed at 10 MB/sec WAN Link speeds at 40 MB/sec
CERN 1000 Processing nodes each of 500 SI95
RC 200 Processing nodes each of 500 SI95
1000 Processing nodestimes 500SI95 = 500kSI95 about the CPU power of CERN Tier0
disk space as for the number of DBs
100kSI95 processing Power = 20% CERN
disk space as for the number of DBs
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Overall ConclusionsOverall ConclusionsOverall ConclusionsOverall Conclusions
MONARC simulation tools are: sophisticated enough to allow modeling of complex
distributed analysis scenarios simple enough to be used by non experts
Initial modeling runs are alkready showing interesting results
Future work will help identify bottlenecks and understand constraints on architectures
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
MONARC Phase 3MONARC Phase 3MONARC Phase 3MONARC Phase 3
More Realistic Computing Model Development
Confrontation of Models with Realistic Prototypes; At Every Stage: Assess Use Cases Based on Actual Simulation,
Reconstruction and Physics Analyses; Participate in the setup of the prototyopes We will further validate and develop MONARC
simulation system using the results of these use cases (positive feedback)
Continue to Review Key Inputs to the Model CPU Times at Various Phases Data Rate to Storage Tape Storage: Speed and I/O
Employ MONARC simulation and testbeds to study CM variations, and suggest strategy improvements
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
MONARC Phase 3MONARC Phase 3MONARC Phase 3MONARC Phase 3 Technology Studies
Data Model Data structures Reclustering, Restructuring; transport
operations Replication Caching, migration (HMSM), etc.
Network QoS Mechanisms: Identify Which are
important Distributed System Resource Management
and Query Estimators (Queue management and Load
Balancing) Development of MONARC Simulation Visualization
Tools for interactive Computing Model analysis
L. Perini MONARC: results and open issuesL
LUND 16 Mar 2000
Relation to GRIDRelation to GRIDRelation to GRIDRelation to GRID
The GRID project is great! Development of s/w tools needed for implementing realistic LHC
Computing Models farm management, WAN resource and data management, etc….
Help in getting funds for real life testbed systems (RC prototypes)
Complementarity GRID-MONARC hierarchical RC Model Hierarchy of RC is a safe option. If GRID will bring big advancements,
less hierarchical models should alo become possible
Timings well matched MONARC Phase-3 to last ~1 year: bridge to GRID project starting
early in 2001 Afterwards common work by LHC experiments for developping the
computing models will surely be still needed: in which project framework and for how long we will see then...