hepix fnal ‘02 25 th oct 2002 alan silverman hepix large cluster sig report alan silverman 25 th...
TRANSCRIPT
25th Oct 2002 HEPiX FNAL ‘02 Alan Silverman
HEPiX Large HEPiX Large Cluster SIG ReportCluster SIG Report
Alan Silverman
25th October 2002
HEPiX 2002, FNAL
19th April 2002 Alan Silverman
2HEPiX Catania
Overview of the talkOverview of the talk
Large Cluster Workshop Large Site Surveys LCCWS Plans
19th April 2002 Alan Silverman
3HEPiX Catania
Large Cluster Workshop - 1Large Cluster Workshop - 1
A workshop to share practical experiences in building and running large clusters.
Gather the information to write the definitive guide to building and running a cluster - how to choose/select/test the hardware; software installation and upgrade tools; performance mgmt, logging, accounting, alarms, security, etc, etc
Then document what exists and what might scale to large clusters.
And by implication, what does not scale
19th April 2002 Alan Silverman
4HEPiX Catania
Large Cluster Workshop - 2Large Cluster Workshop - 2
First instance was May 22nd to 25th 2001 in Fermilab
60 people attended; summaries prepared and published/presented – see web site
http://conferences.fnal.gov/lccws/ Must have been successful in some respects
because ….
19th April 2002 Alan Silverman
5HEPiX Catania
Large Cluster Workshop - 3Large Cluster Workshop - 3
… a second workshop was held this week Two themes – practical experiences again and
technology choices to build, configure and run a cluster
90+ participants this time, over 2 days Overheads from (almost) all talks should be on the
web within a week or so and full proceedings will be published within a month or two.
19th April 2002 Alan Silverman
6HEPiX Catania
LCCWS2 Highlights - 1LCCWS2 Highlights - 1
HEP is starting to get practical experience in running large clusters, practically all on Linux running on commodity hardware.
More and more of these share the resources among several or many client groups
Management overhead increasing and ways are being sought to automate as much as possible but there is no silver bullet
Users starting to expect production services from the Grid
19th April 2002 Alan Silverman
7HEPiX Catania
LCCWS2 Highlights - 2LCCWS2 Highlights - 2
Grid deployment facing resistance by local fabric managers against having to accept masses of incoming software packages and tailoring. Not yet clear if this is an unreasonable fear or one Middleware developers will just have to accept and work around.
Developing Grids by large multi-site collaborations raises many social issues within the teams, more management overhead, more committees and working groups.
19th April 2002 Alan Silverman
8HEPiX Catania
LCCWS2 Highlights - 3LCCWS2 Highlights - 3
Tape and network trends match our perceived needs but CPU trends need to be interpreted.
Intel still reigns in terms of number of nodes but AMD better for floating point at this time and appearing in more and more HEP sites. Will Itanium be important?
Disk sizes growing ok but not much faster, and tape still cheaper. File systems more of an issue
MOSIX not yet ready for large scale use. The larger the cluster, the more professional you must
become at all levels from the ground up (literally).
19th April 2002 Alan Silverman
9HEPiX Catania
Site Surveys - 1Site Surveys - 1
Surveyed the major sites (BNL, Caltech, CERN, FNAL, RAL, SLAC, NERSC)
First survey was for computer centre services such as power backup options and operator cover
Later added a review of videoconference support offerings and anti-virus tools in use
19th April 2002 Alan Silverman
10HEPiX Catania
Site Surveys - 2Site Surveys - 2
Seems to be of interest and not too disturbing Recently surveyed user support features and
choices of PC hardware and the results will be published soon after I get back to CERN
Proposal to survey speed and type of Ethernet connections to desktop
Worth continuing? More site surveys as requested (but not more than one per ??????)
19th April 2002 Alan Silverman
11HEPiX Catania
Plans for LCCWS Plans for LCCWS
The LCCWS workshops appear to fill a niche by addressing practical issues in an HEP environment so the series should probably continue.
Format which seems to be popular is to co-locate it with HEPiX but continue to keep HEP out of the name to encourage participation by other sciences running large clusters.
But keep the HEPiX link by having it driven by the Large Cluster SIG and working in harmony with HEPiX in respect of the scheduling of the meeting itself and of the talks.
Use it as a place to discuss technical issues coming up in grid development? Then it probably needs to be more often, every HEPiX?
Must remain something driven by theme and seeded with invited talks which focus on the theme.