introduction to the ub ccr cluster to the ub ccr cluster ... pc with ansys workbench installed

41
Running CFX on the UB CCR Cluster Introduction to the UB CCR Cluster Getting Help Hardware Resources Software Resources Computing Environment Data Storage Login and File Transfer UBVPN Login and Logout More about X-11 Display File Transfer

Upload: vanngoc

Post on 22-Apr-2018

253 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX on the UB CCR Cluster

Introduction to the UB CCR Cluster Getting Help Hardware Resources Software Resources Computing Environment Data Storage

Login and File Transfer UBVPN Login and Logout More about X-11 Display File Transfer

Page 2: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX on the UB CCR Cluster

Unix Commands Short list of Basic Unix Commands Reference Card Paths and Using Modules

Starting the CFX Solver Launching CFX Monitoring

Running CFX on the Cluster SLURM Scheduler Interactive Jobs Batch Jobs

Page 3: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Information and Getting Help

Getting help: CCR uses an email problem ticket system.

Users send their questions and descriptions of problems to [email protected]

The technical staff receives the email and responds to the user. • Usually within one business day.

This system allows staff to monitor and contribute their expertise to the problem.

CCR website: http://www.buffalo.edu/ccr.html

Page 4: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Cluster Computing

The general-compute partition is the major computational platform of the Center for Computational Research. (~8,000 compute cores)

Login (front-end) and cluster machines run the Linux operating system.

Requires a CCR account. Accessible from the UB domain. The login machine is rush.ccr.buffalo.edu Compute nodes are not accessible from outside

the cluster. Traditional UNIX style command line interface.

A few basic commands are necessary.

Page 5: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Data Storage

Home directory: /user/UBITusername The default user quota for a home directory is 2GB.

• Users requiring more space should contact the CCR staff. Data in home directories are backed up.

• CCR retains data backups for one month. Projects directories:

/projects/research-group-name The default quota for a project directory is 200GB. Data in project directories is NOT backed up by default.

Scratch spaces are available for TEMPORARY use by jobs running on the cluster. /panasas/scratch provides > 100TB of space.

• Accessible from the front-end and all compute nodes.

Page 6: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Accessing the Cluster

The cluster front-end is accessible from the UB domain (.buffalo.edu) Use VPN for access from outside the

University. The UBIT website provides a VPN client for

Linux, MAC, and Windows machines. • http://www.buffalo.edu/ubit.html

The VPN client connects the machine to the UB domain, from which the front-end can be accessed.

Telnet access is not permitted.

Page 7: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Login and X-Display LINUX/UNIX/Mac workstation: ssh rush.ccr.buffalo.edu

• ssh [email protected] The –X or –Y flags will enable an X-Display

from rush to the workstation. • ssh –X rush.ccr.buffalo.edu

Windows workstation: Download and install the X-Win32 client from

www.buffalo.edu/ubit/service-guides/software/by-title.html Use the configuration to setup ssh to rush. Set the command to xterm -ls

Logout: logout or exit in the login window. Furnas 1019 Lab – X-Win32 already installed, but

must add rush connection

Page 8: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

File Transfer

FileZilla is available for Windows, Linux and MAC machines. Check the UBIT software pages. This is a drag and drop graphical interface. Please use port 22 for secure file transfer.

Command line file transfer for Unix. sftp rush.ccr.buffalo.edu

• put, get, mput and mget are used to uploaded and download data files.

• The wildcard “*” can be used with mput and mget. scp filename rush.ccr.buffalo.edu:filename

Furnas 1019 Lab – Use WinSCP

Page 9: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Basic Unix Commands

Using the cluster requires knowledge of some basic UNIX commands.

The CCR Reference Card provides a list of the basic commands. Reference Card is a pdf file, available from:

www.buffalo.edu/ccr/support/UserGuide/BasicUNIX.html

These will get you started, then you can learn more commands as you go. List files:

• ls • ls –la (long listing that shows all files)

Page 10: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Basic Unix Commands

View files: • cat filename (displays file to screen) • more filename (displays file with page breaks)

Change directory: • cd directory-pathname • cd (go to home directory) • cd .. (go back one level)

Show directory pathname • pwd (shows current directory pathname)

Copy files and directories • cp old-file new-file • cp –R old-directory new-directory

Page 11: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Basic Unix Commands

Move files and directories: • mv old-file new-file • mv old-directory new-directory • NOTE: move is a copy and remove

Create a directory: • mkdir new-directory

remove files and directories: • rm filename • rm –R directory (removes directory and

contents) • rmdir directory (directory must be empty) • Note: be careful when using the wildcard “*”

Manual pages for a command: man command

Page 12: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Basic Unix Commands

View files and directory permissions using ls command. • ls –l

Permissions have the following format: • -rwxrwxrwx … filename

– user group other

Change permissions of files and directories using the chmod command. • Arguments for chmod are ugo+-rxw

– user group other read write execute • chmod g+r filename

– add read privilege for group • chmod –R o-rwx directory-name

– Removes read, write and execute privileges from the directory and its contents.

Page 13: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Basic Unix Commands

There are a number of editors available: emacs, vi, nano, pico

• Emacs will default to a GUI if logged in with X-DISPLAY enabled.

Files edited on Windows PCs may have embedded characters that can create runtime problems. Check the type of the file:

• file filename Convert DOS file to Unix. This will remove the

Windows/DOS characters. • dos2unix –n old-file new-file

Page 14: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Modules

Modules are available to set variables and paths for application software, communication protocols, compilers and numerical libraries. module avail (list all available modules) module load module-name (loads a module)

• Updates PATH variable with path of application. module unload module-name (unloads a module)

• Removes path of application from the PATH variable. module list (list loaded modules) module show module-name

• Show what the module sets.

Modules can be loaded in the user’s .bashrc file.

Page 15: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Setup a CFX test case

Create a subdirectory mkdir bluntbody

Change directory to bluntbody cd bluntbody

Copy the BluntBody.def file to the BluntBody directory cp /util/cfx/ansys-15.0/example/BluntBody.def BluntBody.def ls -l

Or…use WinSCP to transfer the .def file from a Windows-based system to a CCR Linux account

Page 16: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Start an interactive job

fisbatch --nodes=1 --tasks-per-node=8 --time=01:00:00 --partition=debug

Requests 8-cores on 1 node for 1 hour --partition=debug requests the debug partition, used

for testing purposes. The maximum wall time for this queue is 1 hour. (note: use --partition=cfx for today’s tutorial)

When we subsequently launch the solver from the CFX GUI, we can instruct the solver to use the requested nodes to run the solution in parallel.

Partition details can be found here: www.buffalo.edu/ccr/support/research_facilities/general_compute/cluster-partitions.html

Page 17: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Start an interactive job

Commands 1-3: setup project, retrieve data file Command 4: request an interactive job Command 5: Once logged into compute node, already in project dir Command 6: Directory listing (to verify we’re in the right place)

1 2 3 4 5 6

$ mkdir bluntbody $ cd bluntbody $ cp /util/cfx/ansys-15.0/example/BluntBody.def ./BluntBody.def $ fisbatch --nodes=1 --tasks-per-node=8 --time=01:00:00 --partition=debug FISBATCH -- waiting for JOBID 2749831 to start on cluster=ub-hpc and partition=debug ...! FISBATCH -- Connecting to head node (d16n02) (the screen will clear) [lsmatott@d16n02 bluntbody]$ [lsmatott@d16n02 bluntbody]$ ls -l total 2290 -rw-r--r-- 1 lsmatott ccrstaff 2052099 Sep 10 09:54 BluntBody.def

Page 18: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Load CFX module

Command 7: loads the CFX module

Command 8: Launches CFX (& detach from command line)

After Command 8, the CFX Solver GUI will be displayed Note: the GUI could also be launched from the remote

visualization nodes. See the link below for instructions: www.buffalo.edu/ccr/support/research_facilities/remote-visualization.html

7 8

$ module load cfx/ub-150 'cfx/ub-150' load complete. cfx5 launches cfx runwb2 launches workbench $ cfx5 & [1] 6535

Page 19: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

CFX initialization (MAE 505 only) • The first time you start CFX on the cluster:

• In CFX Launcher, click: Tools ANSYS Client Licensing Utility

• Click “Set License Preferences” • select 15.0 and click OK

Page 20: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

CFX initialization (MAE 505 only) • Select “Share a single license…” • Click “Apply” then “OK” • Finally, exit the “Admin Utility”

Page 21: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

CFX initialization In CFX Launcher, click on “CFX-Solver Manager 15.0”

After a splash screen is displayed:

Page 22: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX: parallel

In Solver-Manager, Click: FileDefine Run

Click the Browse ( ) icon Navigate to the location of the BluntBody.def file

Select the file and click “Open”.

Page 23: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX: parallel

Additional solver settings using drop-down lists: Use 8 partitions for the parallel environment (since we

requested 8 cores in the fisbatch command)

Match working directory with the location of BluntBody.def

Almost ready to start the run! Bun first, go back to

the terminal window.

Page 24: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX: parallel

Command 9: start “top” to monitor the memory and CPU of the head node (d16n02) for the job

8 9

[lsmatott@d16n02 bluntbody]$ cfx5 & [1] 6535 [lsmatott@d16n02 bluntbody]$ top

Page 25: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX: parallel Open a new terminal connection to the front-end

Command 1: retrieve jobid (2749834 in this case) Command 2: Start slurmjobvis, a tool that monitors

the activity of each processor assigned to a given job.

1 2

$ squeue --user=lsmatott JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2749834 debug FISBATCH lsmatott R 23:19 1 d16n02 $ /util/ccrjobvis/slurmjobvis 2749834 & [1] 9529 d16n02 has been allocated CPUs 0-7 User: lsmatott Job: 2749834 Adding node: d16n02 Warning: detected OpenGL error 'invalid enumerant' at After Renderer::compile Init PCP, cpus: 8

Page 26: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX: parallel In CFX Solver-Manager: Click “Start Run” in the “Define Run” window. After solver is done, click NO for post processing. Platform MPI Local Parallel is used when running on

one multiprocessor machine. To use just one core, you could have chosen “Serial”.

Page 27: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX: parallel

8 cores are being used, as expected.

SLURMJOBVIS

TOP

CFX

Page 28: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX: parallel

Once the Solver completes, copy the results (.res) and output (.out) files from the Linux project directory to a Windows PC with ANSYS Workbench installed.

Use the Windows version of CFX to post-process the results file.

2 cores are being used, as expected.

Page 29: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX : distributed parallel

Start from a fresh login to the cluster, request an interactive job on 2 nodes w/ 8 cores each, load the CFX module, and launch CFX: fisbatch --nodes=2 --ntasks-per-node=8 --time=01:00:00 --partition=debug

module load cfx/ub-150 cfx5 &

The interactive job will log in on the first compute node in the nodelist; this is referred to as the “head node”.

Open another window and log into the cluster. Type: squeue -u username You can see information on your job, including the job id. Type:

/util/ccrjobvis/slurmjobvis <job id> & Click on CFX-Solver Manager 15.0 In “Define Run”, select .def file Type of Run: Full Run mode: Platform MPI Distributed Parallel MPI Distributed Parallel is used with more than one compute node.

Page 30: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX : distributed parallel

To get list of compute nodes, use ‘slist <jobid>’ from a terminal window In the CFX “Define Run” dialog add each compute node ( ) and match

the number of partitions for that node to the number of cpus.

Page 31: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running CFX : distributed parallel Start run and monitor w/ slurmjobvis. Notice that it now runs on 16

cores, between nodes. This job is using the IB network for the MPI communication Ethernet is used for the file system IO and scheduler

Page 32: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Running on the Cluster

Compute machines are assigned to user jobs by the SLURM (Simple Linux Utility for Resource Management) scheduler.

The sbatch command submits unattended jobs to the scheduler.

Interactive jobs are submitted using the fisbatch command and depend on the connection from the workstation to the front-end. If the workstation is shut down or

disconnected from the network, then the fisbatch job will terminate.

Page 33: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

SLURM Execution Model

SLURM executes a login as the user on the master host, and then proceeds according to one of two modes, depending on how the user requested that the job be run. script/batch mode - the user executes the command: sbatch [options] job-script

• where job-script is a standard UNIX shell script containing some SBATCH directives along with the commands that the user wishes to run (examples later).

Interactive/fisbatch - the user executes the command: fisbatch [options]

• the job is run “interactively,” in the sense that standard output and standard error are connected to the terminal session of the initiating ’fisbatch’ command. Note that the job is still scheduled and run as any other batch job (so you can end up waiting a while for your prompt to come back “inside” your batch job).

Page 34: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Execution Model Schematic

sbatch myscript

or fisbatch

slurm controller

SCHEDULER (like a game of Tetris)

Run?

No

Yes

$SLURM_NODELIST

node1

node2

nodeN

prologue $USER login myscript epilogue

Page 35: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

SLURM Partitions

The relevant SLURM partitions for most UB CCR users are general-compute and debug.

The general-compute partition is the default The debug partition is smaller and limited to 1 hour.

Used to test applications. spinfo

Shows partitions defined for the scheduler. Shows max job time limit for each partition. Overall number and status of the nodes on each partition.

Page 36: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

SLURM Features/Constraints

How can I submit a job to a specific type of node?

Use “--constraints=FEATURE” option of the sbatch command.

List of features (i.e. SLURM tags) are given here: www.buffalo.edu/ccr/support/research_facilities/general_compute.html Most features are related to the type of CPU on a

given node: CPU-E5-2660 (2.20 GHz, 16-cores per node) CPU-E5645 (2.40 GHz, 12-cores per node) CPU-L5630 (2.13 GHz, 8-cores per node, Dell) CPU-L5520 (2.27 GHz, 8-cores per node, IBM) etc….

Page 37: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Batch Scripts - Resources

The SBATCH directives are used to request resources for a job. Used in batch scripts and interactive jobs (but without the

#SBATCH prefix).

--time=01:00:00 Requests 1 hour wall-clock time limit. If the job does not complete before this time limit, then it will

be terminated by the scheduler. All tasks will be removed from the nodes.

--nodes=8 --tasks-per-node=2 Requests 8 nodes with 2 tasks (i.e. processors) per node.

Page 38: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Environmental Variables

$SLURM_SUBMIT_DIR - directory from which the job was submitted.

By default, a SLURM job starts from the submission directory and preserves previously loaded environment variables and modules. Preserving modules previously loaded on the front-end can cause

problems in some cases. For example, intel-mpi modules set the MPI environment based on

node communications hardware (i.e. Infiniband), which can be different on the front-end than it is on the compute nodes.

To avoid these problems, you may wish to unload all modules at the start of your SLURM scripts using the “module purge” command.

Alternatively, you can prevent SLURM from preserving all but a few key environment variables, using the --export directive. For example, (note: in script this would be entered all in one line):

#SBATCH --export=SLURM_CPUS_PER_TASK,SLURM_JOB_NAME,SLURM_NTASKS_PER_

NODE,SLURM_PRIO_PROCESS,SLURM_SUBMIT_DIR,SLURM_SUBMIT_HOST

Page 39: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Environmental Variables

$SLURMTMPDIR - reserved scratch space, local to each host (this is a CCR definition, not part of the SLURM package). This scratch directory is created in /scratch and is unique to the

job. The $SLURMTMPDIR is created on every compute node running a

particular job. Files can be transferred to $SLURMTMPDIR using the sbcast

command. You should perform a dummy srun command at the top of your

SLURM script to ensure that the SLURM prolog is run. The prolog script is responsible for creating $SLURMTMPDIR. • srun hostname > /dev/null

$SLURM_NODELIST - a list of nodes assigned to the current batch job. The list is in a compact notation that can be expanded using the “nodeset -e“ command. Used to allocate parallel tasks in a cluster environment.

Page 40: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Sample Script – parallel 1x8

Example of a SLURM script: /util/slurm-scripts/slurmCFX

Request resources (1 node, 8 cores, for 1 hour)

Create node file (required by CFX)

Load software and set limits

Page 41: Introduction to the UB CCR Cluster to the UB CCR Cluster ... PC with ANSYS Workbench installed

Submitting a Batch Job

Navigate to a directory where the SLURM script and your .def file reside; this is $SLURM_SUBMIT_DIR

sbatch slurmCFX (submit batch job) squeue –u username (check job status) /util/ccrjobvis/slurmjobvis [jobid] (view job performance)

Job must be in the “running” (R) state. When finished, the output files will be in $SLURM_SUBMIT_DIR