site report

20
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team

Upload: darryl-mack

Post on 01-Jan-2016

15 views

Category:

Documents


0 download

DESCRIPTION

Site Report. US CMS T2 Workshop. Samir Cury on behalf of T2_BR_UERJ Team. Server's Hardware profile. SuperMicro machines 2 X Intel Xeon dual core @ 2.0 GHz 4 GB RAM RAID 1 - 120 GB HDs. Nodes Hardware profile (40). Dell PowerEdge 2950 2 x Intel Xeon Quad core @ 2.33 GHz 16 GB RAM - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Site Report

Site ReportUS CMS T2 Workshop

Samir Cury on behalf of T2_BR_UERJ Team

Page 2: Site Report
Page 3: Site Report

Server's Hardware profile• SuperMicro machines

• 2 X Intel Xeon dual core @ 2.0 GHz

• 4 GB RAM

• RAID 1 - 120 GB HDs

Page 4: Site Report

Nodes Hardware profile (40)• Dell PowerEdge 2950

– 2 x Intel Xeon Quad core @ 2.33 GHz

– 16 GB RAM

– RAID 0 – 6 x 1 TB Hard Drives

• CE Resources

– 8 Batch slots

– 66.5 kHS06

– 2 GB RAM / Slot

• SE Resources

– 5.8 TB Useful for dCache or hadoop

Private network onlyPrivate network only

Page 5: Site Report

Nodes Hardware profile (2+5)• Dell R710

– 2 are Xen Servers – not worker nodes

– 2 X Intel Xeon Quad core @ 2.4 GHz

– 16 GB RAM

• RAID 0 – 6 x 2 TB Hard Drives

• CE

– 8 Batch Slots (or more?)

– 124.41 kHS06

– 2 GB RAM / Slot

• SE

– 11.8 TB for dCache or hadoop

`

Private network onlyPrivate network only

Page 6: Site Report

First phase nodes Profile (82)

• SuperMicro Server

– 2 Intel Xeon single core @ 2.66 GHz

– 2 GB RAM

– 500 GB Hard Drive & 40 GB Hard Drive

• CE Resources

– Not used – Old CPU’s & low RAM per node

• SE Resources

– 500 GB per node

Page 7: Site Report

Plans for the future - Hardware

• Buying 5 more Dell R710

• Deploying 5 R710 when the disks arrive

– More 80 cores

– More 120 TB Storage

– More 1244 kHS06

Total• CE - 40 PE 2950 + 10 R710 = 400 Cores || 3.9 kHS06

• SE - 240 + 120 + 45 = 405 TB

Page 8: Site Report

Software profile – CE

• OS – CentOS 5.3 64 bits

• 2 OSG Gatekeepers

– Both running OSG - 1.2.x

– Maintenance tasks eased by redundancy – less downtimes

• GUMS 1.2.15

• Condor 7.0.3 for job scheduling

Page 9: Site Report

Software profile – SE

• OS - CentOS 4.7 32 bits

• dCache 1.8

– 4 GridFTP Servers

• PNFS 1.8

• PhEDEx 3.2.0

Page 10: Site Report

Plans for the future: Software/Network

• SE Migration– Right now we use dCache/PNFS

– We plan to migrate to BeStman/Hadoop• Some effort already comes up with results

• Adding the new nodes to the Hadoop SE

• Migrate the data

• Test with real production environment– Jobs and users accessing

• Network Improvement– RNP (our network provider) plan to deliver for us a

10 Gbps link before the next SuperComputing Conference.

Page 11: Site Report

T2 Analysis model & associated Physics groups

We have reserved 30 TB for each of the groups:

• Forward Physics

• B-Physics

• Studying the possibility to reserve space for Exotica

The group has several MSc & PhD students working on CMS Analysis for a long time – These have a good support

Some Grid users submit, sometimes run into trouble and give up – don't ask for support

Page 12: Site Report

Developments• Condor Mechanism based on suspend to give

priority to a very little pool of important users :

– 1 pair of batch slots per core

– When the priority user’s jobs arrive, it pauses the normal job on the other batch slot

– Once it finishes and vacate the slot, his pair automatically resumes.

– Documentation can become available for the interested

– Developed by Diego Gomes

Page 13: Site Report

Developments

• Condor4Web

– Web interface to visualize condor queue

• Shows grid DN’s

– Useful for Grid users that want to know how the job is going scheduled inside the site – http://monitor.hepgrid.uerj.br/condor

– Available on http://condor4web.sourceforge.net

– Still have much to evolve, but already works

– Developed by Samir

Page 14: Site Report

CMS Center @ UERJDuring LISHEP 2009 – January we have inaugurated a small

control room for CMS on UERJ:

Page 15: Site Report

Shifts @ CMS Center

Our computing team have participated on tutorials and now we

have four potential CSP Shifters

Page 16: Site Report

CMS Centre (quick) profile

• Hardware

– 4 Dell workstations with 22” monitors

– 2 x 47” TV’s

– Polycom SoundStation

• Software

– All the conferences including with the other CMS Centers are done via EVO

Page 17: Site Report

Cluster & Team

• Alberto Santoro (General supervisor)• Eduardo Revoredo (Hardware coordinator)• Samir Cury (Site admin)• Douglas Milanez (Trainee)

• Andre Sznajder (Project coordinator)• Jose Afonso (Software coordinator)• Fabiana Fortes (Site admin)• Raul Matos (Trainee)

Page 18: Site Report

2009/2010 year’s goals• We have worked in 2009 mostly in

– Getting rid of the infra-structure problems• Electrical Insuficciency

• AC – Many downtimes due to this

• These are solved now

– Besides that problems• Running official production on small workflows

• Doing private production & analysis for local and Grid users

• 2010 goal

– Use the new hardware and infra-structure for a more reliable site

– Run more heavy workflows and increase participation and presence on official production.

Page 19: Site Report

Thanks!

I want to formally thank Fermilab, USCMS and OSG for their financial help to bring an UERJ

representantive here.

Also want to thank USCMS for this very useful meeting

Page 20: Site Report

Questions? Comments?