17-april-2007 high performance computing basics april 17, 2007 dr. david j. haglin

17
17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

Upload: darcy-robinson

Post on 31-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

High Performance Computing Basics

April 17, 2007

Dr. David J. Haglin

Page 2: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Outline

What is the HPC?

Where did it come from?

How can you get an account on hpc.mnsu.edu?

How can you use it for your research?

Where do you go from here?

Page 3: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

What is the HPC?

Many AMD Opteron Computers (nodes) in a rack

Connected by a high-speed network

In the IT Services Secure area (third floor of the library)

All nodes run linux

http://www.mnsu.edu/hpc

Page 4: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

What is the HPC?

Head node has 8GB RAM; 7.4 TB of Disk

Head node is for doing administrative work and starting long jobs

The 34 Worker nodes are for doing long computations

Each worker has 8GB RAM; 80 GB Hard Disk; 2 dual-core AMD Opteron

•Head Node

•Worker 1 • … •Worker 34

Page 5: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

What is the HPC?

Software Installed:< GNU languages: C/C++ (gcc/g++), Fortran (gfortran)< Message Passing Interface library OpenMPI

Software soon to be installed:< MATLAB< Fluent< Portland Group Fortran and C/C++< IMSL

Email is “local delivery only”

Page 6: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Where did it come from?

National Science Foundation Grant< MRI Program (Major

Research Instrumentation)< $140,000< Institutional Equipment

funds upgraded machine by adding five nodes

PIs: Patrick Tebbe, Rebecca Bates, David Haglin

Proposal focused on a college-wide need for HPC

Vendor: PSSC Labs, Inc.

Page 7: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

How can you get an account?

We must submit a final report to NSF after July 31, 2009

Part of the final report must include how much it was used within CSET (and within MSU).

We need to track usage (research projects).

To get an account, send an email to [email protected] with information as described:< http://www.mnsu.edu/hpc/accounts.html< Your students can get accounts too!

We are very interested in knowing about publications you obtain as a result of using hpc.mnsu.edu.

Page 8: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Your Research

Okay, so you got an account.

Now What?

Page 9: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Your Research

Learning to use HPC.

Learning to use the OpenPBS/Torque job queuing software.

Learning to “design” your usage.

Tutorials will be maintained at www.mnsu.edu/hpc

Page 10: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Your Research

Connect to hpc.mnsu.edu (head node) using ssh< ssh on unix

< PuTTY or SSH Windows Client (IT Services)

< Firewall is pretty tight, may need to request a new opening in the firewall from your location

Line-mode (command-line) interface

Basic unix commands:< http://www.mnsu.edu/hpc/tutorials/linux_basics.doc

Page 11: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Your Research

Disks on hpc:

Page 12: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Your Research

Using OpenPBS/Torque job queuing software:< qstat -- Inspect current job queue< qsub -- Add a new job to the queue< qdel -- Delete one of your jobs from the Q< pbsmon.py -- See the state of the entire machine< xpbsmon -- Uses X11 to display machine state< firefox localhost/ganglia

Detailed information available at:< http://www.clusterresources.com/torquedocs21/users

manual.shtml

Page 13: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Your Research

Designing your usage.< Assume you have a program you want to run for

different parameter values of 1 through 1000

< Ex: $ myProgram -p1

$ myProgram -p2

.

.

$ myProgram -p1000

Page 14: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Your Research

Create 1000 “start scripts” to queue 1000 jobs to the master queue.

Start your jobs and monitor their progress

Combine results when they are all done.

Organize experiments/runs in folders

Use scripting languages such as python to generate start scripts.

Page 15: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Your Research

Input and Output for your jobs:< Your script will start on a worker node

< You can log in to a worker node to see filesystem: ssh n04 df

< Standard Output and Standard Error are separate

< Files are written alongside your script when jobs completes

< No way to monitor progress of your computation

Page 16: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Your Research

Sample script to run from 501 to 505:

Page 17: 17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

17-April-2007

Where do you go from here?

www.mnsu.edu/hpc is a communication portal

Find colleagues who can help

Learn more about the capabilities:< New software

< Parallel programming (MPI)

< Parallel libraries: e.g., ScaLAPACK.

Keep this machine computing fast

Other ideas?