oxford pp computing site report
DESCRIPTION
Oxford PP Computing Site Report. HEPSYSMAN 28 th April 2003 Pete Gronbech. General Strategy. Approx 200 Windows 2000 Desktop PC’s with Exceed used to access central Linux systems Digital Unix and VMS phased out for general use. Red Hat Linux 7.3 is becoming the standard. Network Access. - PowerPoint PPT PresentationTRANSCRIPT
Oxford PP Computing Site Report
HEPSYSMAN
28th April 2003
Pete Gronbech
General Strategy
• Approx 200 Windows 2000 Desktop PC’s with Exceed used to access central Linux systems
• Digital Unix and VMS phased out for general use.
• Red Hat Linux 7.3 is becoming the standard
Network Access
CampusBackboneRouter
Super Janet 4 2.4Gb/s with Super Janet 4
OUCSFirewall
depts
depts
PhysicsFirewall
PhysicsBackboneRouter
100Mb/s
1Gb/s
100Mb/s
1Gb/s
BackboneEdgeRouter
depts
100Mb/s
100Mb/s
100Mb/s
depts
100Mb/s
BackboneEdgeRouter
1Gb/s
Physics Backbone Upgrade to Gigabit Autumn 2002
desktop
ServerGb/s switch
PhysicsFirewall
PhysicsBackboneRouter
1Gb/s
1Gb/s
100Mb/s
100Mb/s
ParticlePhysics
desktop
100Mb/s
100Mb/s
1Gb/s
100Mb/s
Clarendon Lab
1Gb/s
LinuxServer
Win 2kServer
Astro
1Gb/s
1Gb/s
Theory
1Gb/s
Atmos
1Gb/s
pplx1 morpheus pplxfs1 pplxgen pplx21Gb/s
ppcresst1 ppcresst2
ppatlas1 atlassbc
ppminos1 ppminos2
grid pplxbatch pptb01 pptb02
Grid Development
pplx3(SNO)
ppnt117(HARP)
CDF
minos DAQ
Atlas DAQ
cresst DAQ
General Purpose Systems
tblcfg tbse01 tbce01
RH7.3
Fermi7.3.1
RH7.3
RH7.3
RH7.1
RH7.1
RH7.1
RH7.3
RH7.3
RH6.2
RH6.2
RH7.1
RH7.1
RH7.3
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
PBS Batch FarmAutumn 2002
4*Dual 2.4GHz systems
RH7.3
RH7.3
RH7.3
RH7.3
edg uisam testing
Autumn 2002
pplxfs1 pplxgen pplx21Gb/s
General Purpose Systems
RH7.3
RH7.3
RH6.2
PBS Batch FarmAutumn 2002
4*Dual 2.4GHz systems
RH7.3
RH7.3
RH7.3
RH7.3
Zero - D X- 3i SCSI -IDE RAID12 * 160GB Maxtor Drives
Supplied by Compusys
This proved to be a disaster and was rejected in favour of bare scsi disks which we internally mounted in our rack mounted file server
The Linux File Server: pplxfs18*146GB SCSI disks
General Purpose Linux Server : pplxgen
pplxgen is a Dual 2.2GHz Pentium 4 Xeon based system with 2GB ram. It is running Red Hat 7.3It was brought on line at the end of August 2002 to share the load with pplx2 as users migrated off al1 (the Digital Unix Server)
PP batch farm running Red Hat 7.3 with Open PBS can be seen below pplxgen
This service became fully operational in Feb 2003.
pplx1 (new)
morpheus 1Gb/s
grid pplxbatch
pptb01pptb02
Grid Development
CDF
tblcfg tbse01 tbce01
Fermi7.3.1
RH7.1
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
edg uisam testing
matrix
Fermi7.3.1
node9
Fermi7.3.1
cdfsam
Fermi7.3.1
node1
Fermi7.3.1
Fermi7.3.1
Fermi7.3.1
Fermi7.3.1
Fermi7.3.1
Fermi7.3.1
Fermi7.3.1
Fermi7.3.1
Fermi7.3.1
RH6.1
RH7.3
tbwn01 tbwn02
RH6.2
tbgen01
FEBRUARY 2003
LHCB MC
RH6.2
Grid development systems. Including EDG software testbed setup.
New Linux Systems
Morpheus is an IBM x3708 way SMP 700MHz Xeonwith 4GB RAM and1TB Fibre Channel disksInstalled August 2001
Purchased as part of a JIF grantfor the cdf group
Runs Red Hat 7.1
Will use cdf software developed atFermilab and here to process data from the cdf experiment.
Tape Backup is provided bya Qualstar TLS4480tape robot with 80 slots and Dual Sony AIT3 drives.Each tape can hold 100GB of data. Installed January 2002.
Netvault Software from BakBoneis used, running on morpheus, forbackup of both cdf and particle physics systems.
Second round of cdf JIF tender: Dell Cluster - MATRIX10 Dual 2.4GHz P4 Xeon servers running Fermi linux 7.3.1 and SCALI cluster software. Installed December 2002
Approx 7.5 TB for SCSI RAID 5 disks
are attached to the master node.
Each shelf holds 14 146GB disks.
These are shared via NFS with the worker nodes.
OpenPBS batch queuing software is used.
Plenty of space in the second rack for expansion of the cluster.
Lhcb Monte Carlo Setup
8 way 700MHz Xeon Server
RH6.2OpenAFSOpenPBS
gridRH6.2Globus1.1.3OpenAFSOpenPBS
Compute Node
Grid Gateway
The 8 way SMP has now been reloaded as a MS Windows Terminal Server and lhcb MC jobs will be run on the new pp farm.
Problems
• IDE Raid proved to be unreliable, caused lots of down time.
• Problems with NAT (using iptables caused NFS problems and hangs) Solved by dropping NAT and using real IP addresses for PP farm
• Trouble with ext3 journal errors.
• Hackers…
Problems
• Lack of Manpower!• Number of Operating systems slowly reducing, Digital
unix and vms very nearly gone. NT4 also practically eliminated.
• Getting closer to standardising on RH 7.3 especially as the EDG software is now heading that way.
• Still finding it very hard to support laptops but now have a standard clone and recommend IBM laptops.
• Would be good to have more time to concentrate on security…. (See later talk)