hepix fnal ‘02 25 th oct 2002 alan silverman hepix large cluster sig report alan silverman 25 th...

11
25 th Oct 2002 HEPiX FNAL ‘02 Alan Silverman HEPiX Large HEPiX Large Cluster SIG Cluster SIG Report Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

Upload: beverly-warren

Post on 14-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

25th Oct 2002 HEPiX FNAL ‘02 Alan Silverman

HEPiX Large HEPiX Large Cluster SIG ReportCluster SIG Report

Alan Silverman

25th October 2002

HEPiX 2002, FNAL

Page 2: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

19th April 2002 Alan Silverman

2HEPiX Catania

Overview of the talkOverview of the talk

Large Cluster Workshop Large Site Surveys LCCWS Plans

Page 3: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

19th April 2002 Alan Silverman

3HEPiX Catania

Large Cluster Workshop - 1Large Cluster Workshop - 1

A workshop to share practical experiences in building and running large clusters.

Gather the information to write the definitive guide to building and running a cluster - how to choose/select/test the hardware; software installation and upgrade tools; performance mgmt, logging, accounting, alarms, security, etc, etc

Then document what exists and what might scale to large clusters.

And by implication, what does not scale

Page 4: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

19th April 2002 Alan Silverman

4HEPiX Catania

Large Cluster Workshop - 2Large Cluster Workshop - 2

First instance was May 22nd to 25th 2001 in Fermilab

60 people attended; summaries prepared and published/presented – see web site

http://conferences.fnal.gov/lccws/ Must have been successful in some respects

because ….

Page 5: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

19th April 2002 Alan Silverman

5HEPiX Catania

Large Cluster Workshop - 3Large Cluster Workshop - 3

… a second workshop was held this week Two themes – practical experiences again and

technology choices to build, configure and run a cluster

90+ participants this time, over 2 days Overheads from (almost) all talks should be on the

web within a week or so and full proceedings will be published within a month or two.

Page 6: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

19th April 2002 Alan Silverman

6HEPiX Catania

LCCWS2 Highlights - 1LCCWS2 Highlights - 1

HEP is starting to get practical experience in running large clusters, practically all on Linux running on commodity hardware.

More and more of these share the resources among several or many client groups

Management overhead increasing and ways are being sought to automate as much as possible but there is no silver bullet

Users starting to expect production services from the Grid

Page 7: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

19th April 2002 Alan Silverman

7HEPiX Catania

LCCWS2 Highlights - 2LCCWS2 Highlights - 2

Grid deployment facing resistance by local fabric managers against having to accept masses of incoming software packages and tailoring. Not yet clear if this is an unreasonable fear or one Middleware developers will just have to accept and work around.

Developing Grids by large multi-site collaborations raises many social issues within the teams, more management overhead, more committees and working groups.

Page 8: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

19th April 2002 Alan Silverman

8HEPiX Catania

LCCWS2 Highlights - 3LCCWS2 Highlights - 3

Tape and network trends match our perceived needs but CPU trends need to be interpreted.

Intel still reigns in terms of number of nodes but AMD better for floating point at this time and appearing in more and more HEP sites. Will Itanium be important?

Disk sizes growing ok but not much faster, and tape still cheaper. File systems more of an issue

MOSIX not yet ready for large scale use. The larger the cluster, the more professional you must

become at all levels from the ground up (literally).

Page 9: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

19th April 2002 Alan Silverman

9HEPiX Catania

Site Surveys - 1Site Surveys - 1

Surveyed the major sites (BNL, Caltech, CERN, FNAL, RAL, SLAC, NERSC)

First survey was for computer centre services such as power backup options and operator cover

Later added a review of videoconference support offerings and anti-virus tools in use

Page 10: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

19th April 2002 Alan Silverman

10HEPiX Catania

Site Surveys - 2Site Surveys - 2

Seems to be of interest and not too disturbing Recently surveyed user support features and

choices of PC hardware and the results will be published soon after I get back to CERN

Proposal to survey speed and type of Ethernet connections to desktop

Worth continuing? More site surveys as requested (but not more than one per ??????)

Page 11: HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL

19th April 2002 Alan Silverman

11HEPiX Catania

Plans for LCCWS Plans for LCCWS

The LCCWS workshops appear to fill a niche by addressing practical issues in an HEP environment so the series should probably continue.

Format which seems to be popular is to co-locate it with HEPiX but continue to keep HEP out of the name to encourage participation by other sciences running large clusters.

But keep the HEPiX link by having it driven by the Large Cluster SIG and working in harmony with HEPiX in respect of the scheduling of the meeting itself and of the talks.

Use it as a place to discuss technical issues coming up in grid development? Then it probably needs to be more often, every HEPiX?

Must remain something driven by theme and seeded with invited talks which focus on the theme.