multiple computers locked for long periods of time often...

21

Upload: duongnguyet

Post on 31-Jan-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

• Multiple computers locked for long periods of time

• Often just a handful of students

• All computers running Ansys CFX or Fluent

• Often randomly rebooted by

other students and/or staff

• Cannot get a computer when

you need it

• Can lose results when you do

Image credit: John Zaitseff, UNSW

“High performance computing is

used to solve real-world problems

of significant scale or detail across a

diverse range of disciplines including

physics, biology, chemistry,

geosciences, climate sciences,

engineering and many others.”

— Intersect Australia http://www.intersect.org.au/content/time-fast-computing

Image credit: IBM Blue Gene P supercomputer, Argonne National Laboratory

Massively Parallel Distributed

Computational Clusters

• Many individual servers (“nodes”):

dozens to thousands

• Multiple processors per node:

between 8 and 64 cores

• Interconnected by fast networks

• Almost always run Linux

– In our case: Rocks Linux Distribution

on top of CentOS 6.x

The Leonardi cluster

Image credit: John Zaitseff, UNSW

Head Node Storage Node

Internal Network Switch

Com

pute

Node 1

Com

pute

Node 2

Com

pute

Node 3

Com

pute

Node 4

Com

pute

Node n

Internet

Chassis 1

Co

mp

ute

No

de

1-1

Co

mp

ute

No

de

1-2

Co

mp

ute

No

de

1-3

Co

mp

ute

No

de

1-4

Co

mp

ute

No

de

1-n

Chassis m

Co

mp

ute

No

de

m-1

Co

mp

ute

No

de

m-2

Co

mp

ute

No

de

m-3

Co

mp

ute

No

de

m-4

Co

mp

ute

No

de

m-n

• The Newton cluster

– For undergraduate students, postgraduates and staff

– MECH9620, MECH4100, MMAN4010, MMAN4020, MMAN4410,

AERO4110 and AERO4120 students already have an account!

• The Trentino cluster

– For postgraduate students

and staff

– By application

• The Leonardi cluster

– For postgraduate students

and staff

– By application UNSW R1 Data Centre

Image credit: John Zaitseff, UNSW

• 10 × Dell R415 server nodes

– Head node: newton

– Compute nodes: newton01 to newton09

• 160 × AMD Opteron 4386 3.1GHz processor cores

– Two physical processors per node

– Eight CPU cores per processor

– Only four floating-point units per processor

• 320 GB of main memory (32 GB per node)

• 12 TB of storage: 6 × 3 TB drives in RAID 6

• 1Gb Ethernet network interconnect

http://cfdlab.unsw.wikispaces.net/

The Newton cluster

Image credit: John Zaitseff, UNSW

• 16 × Dell R815 server nodes

– Head node: trentino

– Compute nodes: trentino01 to trentino15

• 1024 × AMD Opteron 6272 2.1GHz processor cores

– Four physical processors per node

– Sixteen CPU cores per processor

– Only eight floating-point units per processor

• 2048 GB of main memory (128 GB per node)

• 30 TB of storage: 12 × 3 TB drives in RAID 6

• 4×1Gb Ethernet network interconnect

http://cfdlab.unsw.wikispaces.net/

The Trentino cluster

Image credit: John Zaitseff, UNSW

• 7 × HP BladeSystem c7000 blade enclosures

• 1 × HP ProLiant DL385 G7 server: leonardi

• 56 × HP BL685c G7 compute nodes

– Compute nodes: ec01b01-ec07b08

• 2944 × AMD Opteron 6174 2.2GHz processor cores

and Opteron 6276 2.3GHz processor cores

– Four physical processors per node

– Twelve or sixteen CPU cores per processor

• 8448 GB of main memory (96–512 GB per node)

• 93.5 TB of storage: 70 × 2 TB drives in RAID 6+0

• 2×10Gb Ethernet network interconnect

http://leonardi.unsw.wikispaces.net/ Nodes in the Leonardi cluster

Image credit: John Zaitseff, UNSW

• 3592 × Fujitsu blade server nodes

• Multiple login nodes

• Multiple management nodes

• 57,472 Intel Xeon E5-2670 2.60GHz

processors

• 160 TB of main memory

• 10 PB of storage using the Lustre

distributed file system

• 56Gb Infiniband FDR network

interconnect

http://nci.org.au/nci-systems/national-facility/peak-system/raijin/

Image credit: National Computational Infrastructure

• Use the Secure Shell protocol (SSH)

– Under Linux or Mac OS X: ssh username@hostname

(for example, ssh [email protected])

– Under Windows: PuTTY (Start » All Programs » PuTTY » PuTTY)

– Can install Cygwin: “that Linux feeling under Windows”

• To connect to the Newton cluster:

– Hostname: newton.mech.unsw.edu.au

– Check RSA2 fingerprint: 69:7e:64:75:57:67:ad:4c:21:8e:90:7d:8e:97:70:ce

– User name: your zID

– Password: your zPass

• You will get a command line prompt: something like

• To exit, type exit and press ENTER.

z9693022@newton:~ $

• List files in a directory: ls [options] [pathname ...]

– [ ] indicates optional parameters, ... indicates one or more parameters

– Italic fixed-width font indicates replaceable parameters

– Options include “-l” (letter L) for a long (detailed) listing

• To show the current directory: pwd

• To change directories: cd directory

– ~ is the home directory

– . is the current directory

– .. is the directory above the current one

– ~user is the home directory of user user

– Subdirectories are separated by “/”, e.g., /home/z9693022/src

• To create directories: mkdir directory

• To remove an empty directory: rmdir directory

• To get help for a command: man command

• To output one or more file’s contents: cat filename ...

• To view one or more files page by page: less filename ...

• To copy one file: cp source destination

• To copy one or more files to a directory: cp filename ... dir

• To preserve the “last modified” time-stamp: cp -p

• To copy recursively: cp -pr source destination

• To move one or more files to a different directory: mv filename ... dir

• To rename a file or directory: mv oldname newname

• To remove files: rm filename ...

• Recommendation: use “ls filename ...” before rm or mv: what happens

if you accidentally type “rm *”? or “rm * .c”? (note the space!)

• To copy files to a Linux or Mac OS X system: use scp, rsync or insync

• To copy files to and from a Windows machine: use WinSCP (Start » All Programs » WinSCP » WinSCP), or scp or rsync under Cygwin

• To copy files to and from the Newton cluster:

– Host name newton.mech.unsw.edu.au

– Check RSA2 fingerprint: 69:7e:64:75:57:67:ad:4c:21:8e:90:7d:8e:97:70:ce

– User name: your zID

– Password: your zPass

• Using WinSCP, simply drag and drop files from one pane to the other.

• Use an editor to edit text files

• Many choices, leading to “religious wars”!

• Some options: GNU Emacs, Vim, Nano

• Nano is very simple to use: nano filename

– CTRL-X to exit (you will be asked to save any changes)

• GNU Emacs and Vim are highly customisable and programmable

– For example, see the file ~z9693022/.emacs

– Debra Cameron et al., Learning GNU Emacs, 3rd Edition, O’Reilly

Media, December 2004. ISBN 9780596006488, 9780596104184

– Arnold Robbins et al., Learning the vi and Vim Editors, 7th Edition,

O’Reilly Media, July 2008. ISBN 9780596529833, 9780596159351

1. Set up your job using Ansys CFX as per normal

2. Connect to the Newton cluster using PuTTY

3. Create a directory for this particular job

4. Transfer the .cfx and .def files to that

directory using WinSCP

5. Create an appropriate script file

6. Submit the job to the Newton queue

7. Periodically check the status of the job

8. Once finished, transfer the .out and

.res files to your desktop computer

9. Check the results using the standard

Ansys CFX tools

Image credit: The Ansys Blog at http://www.ansys-blog.com/

1. Set up your job using Ansys CFX as per normal

– May use the laboratory computers to do this

2. Connect to the Newton cluster using PuTTY

– Connect to newton.mech.unsw.edu.au

3. Create a directory for this particular job

– Use the mkdir directory command

– Come up with a consistent naming scheme

– Structure your directories; use subdirectories as required

4. Transfer the .cfx and .def files to that directory using WinSCP

– Connect to newton.mech.unsw.edu.au as before

a. Change to the newly-created directory: cd directory

b. Invoke the text editor to create a script file: nano filename.sh

c. Add the following text, replacing parameters as required:

#!/bin/bash

#SBATCH --time=0-12:00:00 # for 0 days 12 hours

#SBATCH --mem=30720 # 30GB memory

#SBATCH --ntasks=1 # A single job

#SBATCH --cpus-per-task=16 # 16 processor cores

#SBATCH [email protected] # or @student.unsw.edu.au

#SBATCH --mail-type=ALL

cd $SLURM_SUBMIT_DIR

module load cfx/16.2 # or cfx/17.0 as appropriate

cfx5solve -batch -def filename.def -part 16 \

-start-method "Platform MPI Local Parallel"

d. Save the file by pressing CTRL-X and following the prompts

6. Once you have created the filename.sh script file, submit it into the

Newton queue:

– Make sure you are in the correct directory

– Submit the job: sbatch filename.sh

– Take note of the job number: “Submitted batch job jobid”

– Once submitted, you do not need to be connected to the cluster

7. Periodically check on the job status

– The job will start as soon as resources are available for it to run

– Emails will be sent to you on job start and completion

– Show queue status: squeue or squeue -l (letter L)

– Show node status: sinfo

– Cancel a running or queued job: scancel jobid

• Similar to running CFX jobs on the cluster

• Different files need to be transferred to and from the cluster

• Script file is also slightly different:

#!/bin/bash

#SBATCH --time=0-12:00:00 # for 0 days 12 hours

#SBATCH --mem=30720 # 30GB memory

#SBATCH --ntasks=1 # A single job

#SBATCH --cpus-per-task=16 # 16 processor cores

#SBATCH [email protected] # or @student.unsw.edu.au

#SBATCH --mail-type=ALL

cd $SLURM_SUBMIT_DIR

module load fluent/16.2 # or fluent/17.0 as appropriate

fluent 3d -g -t16 -ssh <inputfilename.txt >outputfilename.txt

# may replace “3d” with “2d” for two-dimensional meshes

Whom to ask for help?

1. Your colleagues

2. Your supervisor/lecturer

3. The HPC representative

John Zaitseff

[email protected]

Available for consultations

on Tuesdays 9:30am–4pm

by appointment only.

Image credit: John Zaitseff, UNSW