a. mohapatra, t. sarangi, hepix-lincoln, ne1 university of wisconsin-madison cms tier-2 site report...

Download A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo

If you can't read please download the document

Upload: alaina-carr

Post on 19-Jan-2018

227 views

Category:

Documents


0 download

DESCRIPTION

A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE3 InfrastructureInfrastructure 3 machine rooms, 16 racks Power supply – 650 kW Cooling Chilled water based air coolers and POD based hot aisles

TRANSCRIPT

A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo HEP Computing Group Outline Infrastructure Resources Management & Operation Contributions to CMS Summary A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE2 HistoryHistory Started out as a grid3 site more than a decade ago Played a key role in the formation of the Grid laboratory of Wisconsin (GLOW) HEP/CS (Condor team) collaboration Designed standalone MC production system Adapted CMS software, and ran it robustly in non- dedicated environments (UW grid & beyond) Selected as one of the 7 CMS Tier2 sites in the US Became a member of WLCG and subsequently OSG Serving all OSG supported VOs besides CMS A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE3 InfrastructureInfrastructure 3 machine rooms, 16 racks Power supply 650 kW Cooling Chilled water based air coolers and POD based hot aisles A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE4 Compute / Storage Resources CMS ~25M Hrs Chem IceCube 39M Hrs Last 1 Yr. Compute (SL6 OS) T2 HEP Pool 5300 cores (54K HS06) New purchase this year will add 1400 cores Dedicated to CMS CHTC Pool cores Opportunistic Storage FS (Hadoop) OSG released hadoop-2.0 since March PB non-replicated storage with replication factor=2 800 TB will be added soon A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE5 Network Configuration Internet2 NLR ESNET Chicago Purdue FNAL Nebraska UW Campus T2 LAN Server Switch 100Gb New 3x10Gb 10Gb 1Gb 4x10Gb Perfsonar (Latency & Bandwidth) nodes are used to debug LAN and WAN (+ cloud USCMS) issues 100Gb New A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE6 100Gb Upgrade Strong support from UW-campus network team Upgraded to 100Gb switch for WAN this summer Room-to-room bandwidth will be 60Gb (by the end of the year) This will push data transfers to more than 20Gb Current Max. transfer rate to Wisconsin ~ 10Gb A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE7 IPv4 IPv6 A total of 350 machines (Compute/Storage Nodes/Elements) Connectivity to outside world using Dual Stack IPv4/v6 Network IPv6 is currently statically configured, IPv4 DHCP initially OSG services work with IPv6-only and Dual Stack mode : IPv6 is enabled for GridFTP servers and works in IPv4/v6 Xrootd (non-OSG release) has also been tested to work with IPv6-only and Dual Stack mode Hadoop, SRM communications havent been tested with IPv6 A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE8 Software and Services File systems & proxy service AFS, NFS, CernVM-FS, Frontier/Squid Job batch system HTCondor OSG software stack Globus, GUMS, glexec, GRAM-CEs, SEs, HTCondor-CE(New) Storage and Services Hadoop (hdfs), BeStMan2 SRM, GridFTP, Xrootd Cluster management & monitoring Puppet, ganeti, Nagios, Ganglia A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE9 Cluster Management & Monitoring Puppet Being used for last 2+ years Controls every aspect of software deployment for T2 Integrated with Foreman for monitoring Ganeti (Debian) Virtual machine manager drbd, kvm as the underlying tech. SRM, GUMS, HTCondor-CE, RSV, CVMFS, Puppet, PhEDEx Nagios Hardware, disks temp. etc Ganglia Services, memory, cpu/disk usage, I/O, network, storage OSG and CMS dedicated tools RSV, SAM, Hammer Cloud, Dashboard A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE10 Any Data, Anytime, Anywhere Goal : Make all CMS data transparently available to any CMS physicist, anywhere Transparent and efficient local/remote data access : no need to know about data location Reliable access i.e. failures are hidden from the users view Ability to run CMS software from non-CMS managed worker nodes Scheduling excess demand for CPUs to overflow to remote resources The technologies that make this possible: xrootd (read data from anywhere) cvmfs + parrot (read software from anywhere) glideinWMS/HTCondor(send jobs anywhere) A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE11 Any Data Anytime Anywhere Scale Tests AAA scale test using HTCondor: Tier-2 cluster at Wisconsin provides 10K condor job slots for the scale test (running parallel to main condor pool) Underlying technology for data access : Xrootd Works with heterogenous storage systems 10K File Read Test From/To Wisconsin through FNAL global Xrootd redirector on 08/08/14 A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE12 SummarySummary The site is in good health and performing well Making our best effort to maintain the high availability/reliability while productively serving CMS and the grid community. A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE13 Thank You ! Questions / Comments ?