site report

Site ReportUS CMS T2 Workshop

Samir Cury on behalf of T2_BR_UERJ Team

Server's Hardware profile• SuperMicro machines

• 2 X Intel Xeon dual core @ 2.0 GHz

• 4 GB RAM

• RAID 1 - 120 GB HDs

Nodes Hardware profile (40)• Dell PowerEdge 2950

– 2 x Intel Xeon Quad core @ 2.33 GHz

– 16 GB RAM

– RAID 0 – 6 x 1 TB Hard Drives

• CE Resources

– 8 Batch slots

– 66.5 kHS06

– 2 GB RAM / Slot

• SE Resources

– 5.8 TB Useful for dCache or hadoop

Private network onlyPrivate network only

Nodes Hardware profile (2+5)• Dell R710

– 2 are Xen Servers – not worker nodes

– 2 X Intel Xeon Quad core @ 2.4 GHz

– 16 GB RAM

• RAID 0 – 6 x 2 TB Hard Drives

• CE

– 8 Batch Slots (or more?)

– 124.41 kHS06

– 2 GB RAM / Slot

• SE

– 11.8 TB for dCache or hadoop

`

Private network onlyPrivate network only

First phase nodes Profile (82)

• SuperMicro Server

– 2 Intel Xeon single core @ 2.66 GHz

– 2 GB RAM

– 500 GB Hard Drive & 40 GB Hard Drive

• CE Resources

– Not used – Old CPU’s & low RAM per node

• SE Resources

– 500 GB per node

Plans for the future - Hardware

• Buying 5 more Dell R710

• Deploying 5 R710 when the disks arrive

– More 80 cores

– More 120 TB Storage

– More 1244 kHS06

Total• CE - 40 PE 2950 + 10 R710 = 400 Cores || 3.9 kHS06

• SE - 240 + 120 + 45 = 405 TB

Software profile – CE

• OS – CentOS 5.3 64 bits

• 2 OSG Gatekeepers

– Both running OSG - 1.2.x

– Maintenance tasks eased by redundancy – less downtimes

• GUMS 1.2.15

• Condor 7.0.3 for job scheduling

Software profile – SE

• OS - CentOS 4.7 32 bits

• dCache 1.8

– 4 GridFTP Servers

• PNFS 1.8

• PhEDEx 3.2.0

Plans for the future: Software/Network

• SE Migration– Right now we use dCache/PNFS

– We plan to migrate to BeStman/Hadoop• Some effort already comes up with results

• Adding the new nodes to the Hadoop SE

• Migrate the data

• Test with real production environment– Jobs and users accessing

• Network Improvement– RNP (our network provider) plan to deliver for us a

10 Gbps link before the next SuperComputing Conference.

T2 Analysis model & associated Physics groups

We have reserved 30 TB for each of the groups:

• Forward Physics

• B-Physics

• Studying the possibility to reserve space for Exotica

The group has several MSc & PhD students working on CMS Analysis for a long time – These have a good support

Some Grid users submit, sometimes run into trouble and give up – don't ask for support

Developments• Condor Mechanism based on suspend to give

priority to a very little pool of important users :

– 1 pair of batch slots per core

– When the priority user’s jobs arrive, it pauses the normal job on the other batch slot

– Once it finishes and vacate the slot, his pair automatically resumes.

– Documentation can become available for the interested

– Developed by Diego Gomes

Developments

• Condor4Web

– Web interface to visualize condor queue

• Shows grid DN’s

– Useful for Grid users that want to know how the job is going scheduled inside the site – http://monitor.hepgrid.uerj.br/condor

– Available on http://condor4web.sourceforge.net

– Still have much to evolve, but already works

– Developed by Samir

CMS Center @ UERJDuring LISHEP 2009 – January we have inaugurated a small

control room for CMS on UERJ:

Shifts @ CMS Center

Our computing team have participated on tutorials and now we

have four potential CSP Shifters

CMS Centre (quick) profile

• Hardware

– 4 Dell workstations with 22” monitors

– 2 x 47” TV’s

– Polycom SoundStation

• Software

– All the conferences including with the other CMS Centers are done via EVO

Cluster & Team

• Alberto Santoro (General supervisor)• Eduardo Revoredo (Hardware coordinator)• Samir Cury (Site admin)• Douglas Milanez (Trainee)

• Andre Sznajder (Project coordinator)• Jose Afonso (Software coordinator)• Fabiana Fortes (Site admin)• Raul Matos (Trainee)

2009/2010 year’s goals• We have worked in 2009 mostly in

– Getting rid of the infra-structure problems• Electrical Insuficciency

• AC – Many downtimes due to this

• These are solved now

– Besides that problems• Running official production on small workflows

• Doing private production & analysis for local and Grid users

• 2010 goal

– Use the new hardware and infra-structure for a more reliable site

– Run more heavy workflows and increase participation and presence on official production.

Thanks!

I want to formally thank Fermilab, USCMS and OSG for their financial help to bring an UERJ

representantive here.

Also want to thank USCMS for this very useful meeting

Questions? Comments?

site report

Documents

ghz16 gb ramraid

ghz4 gb ramraid

gb hds nodes hardware

khs062 gb ram slotse11

gb hard drivece resourcesnot

grid users

tb useful

tb storagemore