effective user services for high performance computing a white paper by the teragrid science...

12
Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

Upload: hector-brooks

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

Effective User Services forHigh Performance Computing

A White Paper by theTeraGrid Science Advisory

BoardMay 2009

Page 2: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

2

Contents

1) HPC User Services Defined

2) Characteristics of HPC User Services

3) Essential Elements of an HPC User Services Program

1. Routine User Services

2. Effective Use of Resources

3. Project Domain Specific Support

4. User-Developed Software Support

5. Communications, Education, and Outreach

4) Social Web and HPC User Services

5) Conclusions

Page 3: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

3

HPC User Services(definition)

• User Services: Computer system resource providers helping users make effective use of the resources ranging from:

– Simple: helping users login or change their passwords

– Complex: very sophisticated, high level, support to scientific users.

• Benefits to both resource providers and users

– Benefit to digitial resource providers: ensure that costly computer resources are used safely, securely, efficiently, to their full capability, for a wide range of sciences and users

– Benefit to digital resource users: ensure efficient and effective access to full range of capabilities, gaining full advantage for scientific research, while quickly resolving issues and minimizing adverse impacts on research productivity

Page 4: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

4

HPC User Characteristics

Finding: Users of the largest share of HPC resources are:

Very technically savvy

Extreme

invest extensive effort to exploit seldom-used capabilities

use latest features of software

pushe limits of capabilities; routinely attempt leading (bleeding) edge computations

Likely to discover hardware or system software problems But not responsible for their repair!

Use software written by themselves or scientific colleagues require HPC software development environments

Unlikely to expect to be charged for support Constantly in flux

scientists become new HPC users HPC resources change with time

Page 5: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

5

HPC User Characteristics

Finding: three other groups of users: Those just starting out

Those with moderate requirements

Those who prefer science gateways or other mechanisms that hide some of the complexities of HPC systems

Recommendation:

User Services must address the needs of

all types of users

Page 6: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

6

Essential Elements of an Effective User Services Program

Routine User Services User account management, allocations, storage quota, hardware and software documentation, and issue resolution using a trouble ticket system.

Effective Use of the Resources

Help users identify appropriate hardware and software and improve performance.

Project- and Domain-Specific Support

Link to resources in appropriate software engineering or scientific communities that enable this support.

User-Developed Software Support

Support for user-developed software, including a robust development environment, analysis of performance, debugging, and porting.

Commonly-Used Software Support

Support for commonly used software, analysis of performance, debugging, and porting user problems to the software or the system.

Communications, Education, and Outreach

Communication of (1) system and support capabilities to current and potential users; (2) issues and concerns from users to appropriate staff; and (3) results to funding agencies and scientific communities. Educational and training programs for users.

Page 7: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

7

Routine User Services

• Definition: Routine services include user account management, allocations, storage quotas, documentation, and issue resolution (trouble ticket system)

• Finding life cycle of issue resolution

1. Analysis of requirements and readiness2. Triage3. Transition4. Tracking 5. Resolution

• Recommendations– Provide multiple pathways of information

• Help users help themselves• Provide basic support to new and naïve user• Provide examples

– Show status of user reported problems– Optimize the most common tasks first– Provide an escalation path

Page 8: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

8

Effective Use of Resources

Finding: HPC users need help extracting optimal system performance

•Recommendations– Help match users to most appropriate

resource– Help users make effective use of systems

• Training classes• Performance optimization tools• Performance engineering and application tuning• Individual program or project meetings

Page 9: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

9

Project- and Domain-Specific Support

•Recommendations– Link to resources in appropriate software

engineering or scientific communities •e.g. Science Gateways

– Assign a support person … •to each new project•to each domain or scientific community

– Facilitate communications between groups that use similar computing techniques

– Take advantage of obvious opportunities for technology transfers •Standardized data storage formats •Visualization technologies

Page 10: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

10

User-Developed Software Support

•Finding– High level of software expertise is required within

HPC User Services

•Recommendations– Provide a robust development environment

•Editors, compilers, linkers, and performance monitors•Multiple programming languages •Multiple compilers from multiple vendors •Multiple versions of parallel libraries (e.g. MPICH, OpenMP)

•Tools for source code management, performance analysis, debugging, and porting

– Plan for software that is dynamic and continually evolving

Page 11: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

11

Communications, Education and Outreach

• Finding: (At least) three types of communications– Capabilities of available resources potential and current users– Benefits of HPC science research and education– HPC providers users (including potential users)

• Recommendations– Communication

•Use examples of user successes and important results •Widen the use of HPC systems within and across scientific communities•Use multiple media, forums, and formats including print, web, email, etc.

– Education and training•Provide classroom- or web-based training on effective use of HPC systems •Leverage education and training capabilities from host institutions

– Outreach•Increase involvement of underrepresented groups in HPC activities•Build interest in and establish value of HPC results in social, business, and governmental communities

•Support diverse group of scientists at the routine and more in-depth levels

•Avoid piquing interests that cannot be satisfied (manage expectations)

Page 12: Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009

12

Social Web and HPC User Services

• Finding: Social web phenomena are changing how people with common interests and goals interact with each other – Web 2.0 tools such as blogs, wikis, web forums etc.: passive

“brochure-ware” collaborative environment – Email subscription lists, RSS feeds, and news aggregators

help people filter an overwhelming volume of information – Open-source collaborative software development

• Recommendation: Can take advantage of significant promise for HPC Users Services (if applied properly)– Help HPC users find each other– Identify others with common interests or problems– Communicate interests and specialties– Collaborate on resolving problems, or performing research– Ensure that authoritative information eventually becomes

prevalent (viz. Wikipedia)