site report
Post on 01-Jan-2016
15 Views
Preview:
DESCRIPTION
TRANSCRIPT
Site ReportUS CMS T2 Workshop
Samir Cury on behalf of T2_BR_UERJ Team
Server's Hardware profile• SuperMicro machines
• 2 X Intel Xeon dual core @ 2.0 GHz
• 4 GB RAM
• RAID 1 - 120 GB HDs
Nodes Hardware profile (40)• Dell PowerEdge 2950
– 2 x Intel Xeon Quad core @ 2.33 GHz
– 16 GB RAM
– RAID 0 – 6 x 1 TB Hard Drives
• CE Resources
– 8 Batch slots
– 66.5 kHS06
– 2 GB RAM / Slot
• SE Resources
– 5.8 TB Useful for dCache or hadoop
Private network onlyPrivate network only
Nodes Hardware profile (2+5)• Dell R710
– 2 are Xen Servers – not worker nodes
– 2 X Intel Xeon Quad core @ 2.4 GHz
– 16 GB RAM
• RAID 0 – 6 x 2 TB Hard Drives
• CE
– 8 Batch Slots (or more?)
– 124.41 kHS06
– 2 GB RAM / Slot
• SE
– 11.8 TB for dCache or hadoop
`
Private network onlyPrivate network only
First phase nodes Profile (82)
• SuperMicro Server
– 2 Intel Xeon single core @ 2.66 GHz
– 2 GB RAM
– 500 GB Hard Drive & 40 GB Hard Drive
• CE Resources
– Not used – Old CPU’s & low RAM per node
• SE Resources
– 500 GB per node
Plans for the future - Hardware
• Buying 5 more Dell R710
• Deploying 5 R710 when the disks arrive
– More 80 cores
– More 120 TB Storage
– More 1244 kHS06
Total• CE - 40 PE 2950 + 10 R710 = 400 Cores || 3.9 kHS06
• SE - 240 + 120 + 45 = 405 TB
Software profile – CE
• OS – CentOS 5.3 64 bits
• 2 OSG Gatekeepers
– Both running OSG - 1.2.x
– Maintenance tasks eased by redundancy – less downtimes
• GUMS 1.2.15
• Condor 7.0.3 for job scheduling
Software profile – SE
• OS - CentOS 4.7 32 bits
• dCache 1.8
– 4 GridFTP Servers
• PNFS 1.8
• PhEDEx 3.2.0
Plans for the future: Software/Network
• SE Migration– Right now we use dCache/PNFS
– We plan to migrate to BeStman/Hadoop• Some effort already comes up with results
• Adding the new nodes to the Hadoop SE
• Migrate the data
• Test with real production environment– Jobs and users accessing
• Network Improvement– RNP (our network provider) plan to deliver for us a
10 Gbps link before the next SuperComputing Conference.
T2 Analysis model & associated Physics groups
We have reserved 30 TB for each of the groups:
• Forward Physics
• B-Physics
• Studying the possibility to reserve space for Exotica
The group has several MSc & PhD students working on CMS Analysis for a long time – These have a good support
Some Grid users submit, sometimes run into trouble and give up – don't ask for support
Developments• Condor Mechanism based on suspend to give
priority to a very little pool of important users :
– 1 pair of batch slots per core
– When the priority user’s jobs arrive, it pauses the normal job on the other batch slot
– Once it finishes and vacate the slot, his pair automatically resumes.
– Documentation can become available for the interested
– Developed by Diego Gomes
Developments
• Condor4Web
– Web interface to visualize condor queue
• Shows grid DN’s
– Useful for Grid users that want to know how the job is going scheduled inside the site – http://monitor.hepgrid.uerj.br/condor
– Available on http://condor4web.sourceforge.net
– Still have much to evolve, but already works
– Developed by Samir
CMS Center @ UERJDuring LISHEP 2009 – January we have inaugurated a small
control room for CMS on UERJ:
Shifts @ CMS Center
Our computing team have participated on tutorials and now we
have four potential CSP Shifters
CMS Centre (quick) profile
• Hardware
– 4 Dell workstations with 22” monitors
– 2 x 47” TV’s
– Polycom SoundStation
• Software
– All the conferences including with the other CMS Centers are done via EVO
Cluster & Team
• Alberto Santoro (General supervisor)• Eduardo Revoredo (Hardware coordinator)• Samir Cury (Site admin)• Douglas Milanez (Trainee)
• Andre Sznajder (Project coordinator)• Jose Afonso (Software coordinator)• Fabiana Fortes (Site admin)• Raul Matos (Trainee)
2009/2010 year’s goals• We have worked in 2009 mostly in
– Getting rid of the infra-structure problems• Electrical Insuficciency
• AC – Many downtimes due to this
• These are solved now
– Besides that problems• Running official production on small workflows
• Doing private production & analysis for local and Grid users
• 2010 goal
– Use the new hardware and infra-structure for a more reliable site
– Run more heavy workflows and increase participation and presence on official production.
Thanks!
I want to formally thank Fermilab, USCMS and OSG for their financial help to bring an UERJ
representantive here.
Also want to thank USCMS for this very useful meeting
Questions? Comments?
top related