m.s mabakane hpc users and administrators workshop
TRANSCRIPT
![Page 1: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/1.jpg)
M.S MABAKANE
HPC USERS AND ADMINISTRATORS WORKSHOP
![Page 2: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/2.jpg)
INDUCTION COURSE
2
![Page 3: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/3.jpg)
APPLY FOR SYSTEM ACCOUNT
Below is the summary of account application process:
Apply online using: http://www.chpc.ac.za/index.php/contact-us/apply-for-resources-form
Application recorded in the helpdesk system
Committee approve/reject the application
User sign CHPC Use Policy
User account created in the system
3
![Page 4: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/4.jpg)
LOGIN INTO THE SYSTEMSTo login to the systems, ssh into the following hostnames:
Sun cluster: sun.chpc.ac.za
GPU cluster: gpu.chpc.ac.za
The users connects to the systems using 10 GB/s network bandwidth from anywhere in South Africa and 100 MB/s outside of the country.
The amount of network speed determines the level of accessing or copying data into systems. For more info about login and use of HPC systems, please visit: wiki.chpc.ac.za.
4
![Page 5: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/5.jpg)
COPYING DATADifferent modes can be used to copy data into or out of the HPC systems:
Linux machine: scp -r username@hostname:/directory-to-copy.
The windows users can use an application, namely, Winscp to copy data from the computer to the HPC clusters.
Moreover, WinSCP can also be used to copy data from the clusters to the users computer.
Basic information on how to install Ubuntu Linux operating system can be found on: https://help.ubuntu.com/community/Installation. For more info on how to install and use WinSCP, please visit these website: http://winscp.net/eng/docs/guide_install.
5
![Page 6: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/6.jpg)
HPC SYSTEMS
6
sun cluster
Blue Gene/P
GPU clusterSA Grid system
6
![Page 7: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/7.jpg)
Sun Cluster
7
![Page 8: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/8.jpg)
OVERVIEW OF SUN CLUSTER
88
![Page 9: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/9.jpg)
DIFFERENT COMPONENTS OF SUN SYSTEM
9
Nehalem Harpertown
Sparc
WestmereDell Visualization node
Sparc
![Page 10: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/10.jpg)
10
CLUSTER NEHALEM HARPERTOWN WESTMERE DELL SPARC VIZ
Operating system
Centos 5.8 Centos 5.8 Centos 5.8 Centos 5.8 Solaris 10 Redhat 5.1
File system /opt/gridware/export/home/luster/SCRATCH 1, 2, 3, 4, 5
/opt/gridware/export/home/luster/SCRATCH1, 2, 3, 4, 5
/opt/gridware/export/home/luster/SCRATCH 1, 2, 3, 4, 5
/opt/gridware/export/home/luster/SCRATCH 1, 2, 3, 4, 5
/scratch/home/scratch/work
/scratch/home/scratch/work
CPU* Intel Xeon 2.93 Ghz
Intel Xeon 3.0 Ghz
Intel Xeon 2.93 Ghz
Intel Xeon 2.93 Ghz
Sparcv9 AMD Opteron 2.93 GHZ
Memory 12 GB 16 GB 24 36 2TB 64 GB
For more information visit: http://wiki.chpc.ac.za/quick:start#what_you_have
DIFFERENT COMPONENTS OF SUN SYSTEM (cont…)
10
![Page 11: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/11.jpg)
FILESYSTEM STORAGEThe sun system is equipped with five important different directories used to store data in Lustre filesystem. Below is the major directories within the storage:
/opt/gridware/compilers
/opt/gridware/libraries /opt/gridware/applications
/export/home/username
/export/home/username/scratch
11
![Page 12: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/12.jpg)
STORAGE & BACK-UP POLICIESCHPC has implemented quota policies to govern the storage capacity of the supercomputing system.
User are allowed to store up a maximum of 10 GB data in their home directories.
The system provide a grace period of seven days for users who exceeded 10 GB in the home directories.
User’s home directory is backed-up. No back-up for scratch.
Data older than 90 days is deleted in the user’s scratch directory.
12
![Page 13: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/13.jpg)
COMPILERSDifferent compilers are available to compile various parallel programs in the sun system. The modules are used to load the following available compilers in the user environment:
GNU & Intel compilers:
G.C.C 4.1.2 (gfortran, gcc and g++) -> /usr/bin
G.C.C 4.7.2 (gfortran, gcc and g++) -> module add gcc/4.7.2
Intel 12.1.0 with MKL and IntelMPI (ifort, icc, icpc) -> module add intel2012
Intel 13.0.1 with MKL and IntelMPI -> module add intel-XE/13.0
Sun Studio (suncc, sunf95, c++filt) -> module add sunstudio
13
![Page 14: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/14.jpg)
MESSAGE PASSING IINTERFACE (MPI)Various message passing interface (mpi) are used to parallelize applications on the sun system and this include:
OpenMPI 1.6.1 compiled with gnu (mpicc, mpiCC, mpif90 and mpif77) -> module add openmpi/openmpi-1.6.1-gnu
OpenMPI 1.6.1 compiled with intel (mpicc, mpiCC, mpif90 and mpif77) -> module add openmpi/openmpi-1.6.1-intel
OpenMPI 1.6.5 compiled with gnu (mpicc, mpiCC, mpif90 and mpif77) -> module add openmpi/openmpi-1.6.5-gnu & module add openmpi/openmpi-1.6.5_gcc-4.7.2
OpenMPI 1.6.5 compiled with Intel (mpicc, mpiCC, mpif90 and mpif77) -> module add openmpi/openmpi-1.6.5-intel & module add openmpi/openmpi-1.6.5-intel-XE-13.0
14
![Page 15: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/15.jpg)
APPLICATIONSDifferent scientific and commercial groups utilise various parallel applications to perform computational calculations using sun cluster. The most popular applications in the cluster are as follows:
Weather Research Forecast (WRF) WRF-Chem DL_POLY ROMS (Regional Oceanic Modelling system) Gaussian VASP Gadget Material and Discovery studio CAM Quantum Espresso
15
![Page 16: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/16.jpg)
MOAB AND TORQUEMoab cluster suite is a scheduling tool used to control jobs on both sun cluster.
Moreover, torque is utilised to monitor the computational resources available in the clusters.
Basic moab commands:
msub - submit job showq - check status of the job canceljob - cancel job in the cluster
16
![Page 17: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/17.jpg)
EXAMPLE OF MOAB SCRIPT#!bin/bash #MSUB -l nodes=3:ppn=12 #MSUB -l walltime=2:00:00 #MSUB -l feature=dell|westmere #MSUB -m be #MSUB -V #MSUB -o /lustre/SCRATCH2/users/username/file.out #MSUB -e /lustre/SCRATCH2/users/username/file.err #MSUB -d /lustre/SCRATCH2/users/username/ #MSUB -mb ##### Running commands nproc=`cat $PBS_NODEFILE | wc -l` mpirun -np $nproc <executable> <output>
17
![Page 18: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/18.jpg)
GPU Cluster
18
![Page 19: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/19.jpg)
GPU CLUSTER
1919
![Page 20: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/20.jpg)
COMPUTE NODES
20
24 GB of memory
16 Intel Xeon processors (2.4 Ghz)
4 X Nvidia Tesla GPU cards
96 GB local hard drive capacity
20
![Page 21: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/21.jpg)
FILE SYSTEM The GPU cluster is attached to 14 terabytes managed using GPFS across the entire supercomputer. The cluster has the following file system structure:
/GPU/home
/GPU/opt
All the libraries and compilers are located in /GPU/opt. On the other hand, /GPU/home is used to store user’s applications and output data.
No back-up and storage policies on the GPU system
21
![Page 22: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/22.jpg)
COMPILERS and MPIDifferent kinds of libraries are used to compile and execute parallel applications simulating in the GPU system:
Intel compiler 12.1.0 with MKL and MPI (ifort, icc, icpc) -> module load intel/12.0
Intel compiler 13.0.1 with MKL and IntelMPI -> module load intel/13.0
G.C.C 4.6.3 (gfortran, gcc and g++) -> module load gcc/4.6.3
Mpich 2.1.5 compiled with gnu (mpirun, mpicc, mpif90) -> module load mpich2/1.5
Mvapich compiled with Intel (mpirun, mpicc, mpif90) -> module load mvapich2/intel/1.9
22
![Page 23: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/23.jpg)
APPLICATIONS Below is the list of available applications running in the GPU cluster:
Emboss
NAMD
AMBER
Gromacs
23
![Page 24: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/24.jpg)
QUESTIONS
2424
![Page 25: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/25.jpg)
OPERATIONAL SERVICES
2525
![Page 26: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/26.jpg)
SLA BETWEEN CHPC & USERSThe CHPC has developed the service level agreement with the users to ensure smooth operations and utilisation of the supercomputing system.
To this end; CHPC is responsible for ensuring that users’ queries are responded and resolved on-time. Below is the breakdown of queries associated with the resolution time:
Create user accounts, login and ssh - 1 day Network - 1 dayStorage - 2 daysInstalling softwares and libraries - 3 days Compiling & Porting applications - 3 days
All the queries and incidents are recorded in the helpdesk system.
26
![Page 27: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/27.jpg)
HELPDESK STATISTICS
L1 - all calls resolved within 24 hrs e.g. ssh, login and user creationL2 - all calls such storage, third party software and hardware resolved within 48 hrs L3 - all calls resolved within 72 hrs. e.g. compilers, software and applications
Service Level Resolved Total received Percentage Target
Level 1 – (L1) 229 230 99% 97%
Level 2 – (L2) 19 19 100% 95%
Level 3 – (L3) 165 179 92% 80%
27
![Page 28: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/28.jpg)
CUSTOMER SATISFACTIONIn the 2nd quarter of 2013/14; customer satisfaction survey was conducted in order to understand the users satisfaction with regard to the services provided by the centre.
Furthermore, the aim of the survey was to collect critical suggestions, complaints and compliments that may lead to improve the overall operational services within the CHPC. The survey was categorised into the following pillars:
Helpdesk System performanceSchedulerTraining and workshop CHPC Website and wiki
28
![Page 29: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/29.jpg)
CUSTOMER SURVEY
29
![Page 30: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/30.jpg)
CUSTOMER SURVEY (cont…)
30
![Page 31: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/31.jpg)
CUSTOMER SURVEY (cont…)
31
![Page 32: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/32.jpg)
CUSTOMER SURVEY (cont…)
32
![Page 33: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/33.jpg)
CUSTOMER SURVEY (cont…)
33
![Page 34: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/34.jpg)
CUSTOMER SURVEY (cont…)
34
![Page 35: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/35.jpg)
CUSTOMER SURVEY (cont…)
35
![Page 36: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/36.jpg)
QUESTIONS & DISCUSSION!!
3636
![Page 37: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/37.jpg)
ADMINISTRATING SUPERCOMPUTING SYSTEMS
3737
![Page 38: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/38.jpg)
DIFFERENT TYPES OF SUPERCOMPUTERS Supercomputers are regarded as the fastest computers that can perform millions/trillions of calculations within a short period of time. These supercomputing systems can be classified into various categories such as:
Distributed-memory systems
Shared-memory machines
Vector systems
Most of the scientists utilise these fastest computers to compute parallel applications and generate scientific output as quick as possible.
38
![Page 39: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/39.jpg)
PERFORMANCE OF PROGRAMS ON SUPERCOMPUTERS
The performance of parallel applications is mainly affected by interplays of factors such as (Adve and Vernon, 2004; Ebnenasir and Beik, 2009):
Limited network bandwidthUnevenly distribution of message-passingSlow read/write requests within the storageLogic of the parallel codeHigh memory latency in the processing nodesHigh processor utilisation in the execution nodesAdve, V.S., and Vernon, M.K. (2004). Parallel program performance prediction using deterministic task graph analysis. ACM Transactions on Computer Systems. 22(1): 94-136. Ebnenasir, A., and Beik, R. (2009). Developing parallel programs: a design-oriented perspective. Proceedings of the 2009 IEEE 31st international conference on software engineering. Vancouver: IEEE Computer Society, pp. 1-8.
39
![Page 40: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/40.jpg)
THE PERFORMANCE OF THE SUPERCOMPUTERS Different components (e.g. processor, memory, network) play an important role in determining the performance of the supercomputers such as clusters, massive-parallel processing and shared-memory machines.
In the CHPC - sun cluster, execution nodes are equipped with different processors and memory to simulate applications.
The performance of the execution nodes is therefore important to compute parallel programs and generate output as quick as possible.
In this case, we look at the statistics of computational resources used to process applications in the sun cluster.
40
![Page 41: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/41.jpg)
Machine Usage On Sun Cluster
4141
![Page 42: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/42.jpg)
TOP USERS V/S JOBS
42
![Page 43: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/43.jpg)
TOP USERS V/S CPU
43
![Page 44: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/44.jpg)
APPLICATIONS V/S CPU
44
![Page 45: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/45.jpg)
WAITING TIME IN THE QUEUE
45
![Page 46: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/46.jpg)
JOBS V/S WALLTIME
46
![Page 47: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/47.jpg)
Sun Storage
4747
![Page 48: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/48.jpg)
OVERVIEW OF THE STORAGE
![Page 49: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/49.jpg)
LUSTRE FILESYSTEMLustre is a distributed parallel file system used to manage, share and monitor data in the storage. In the sun cluster, it is configured to administer the following storage capacity:
1 Petabytes (SCRATCH 5)
480 terabytes (SCRATCH 1, 2, 3 and 4)
Both sub-storage systems (480 terabytes and 1 petabytes) are shared across the entire cluster. On this front, ClusterStor manager is used to monitor and reports the status of 1 petabytes sub-storage.
Different scripts are used to monitor and control the shared 480 terabytes.
49
![Page 50: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/50.jpg)
ClusterStor Manager
50
![Page 51: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/51.jpg)
ClusterStor Manager (cont…)
51
![Page 52: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/52.jpg)
ClusterStor Manager (cont…)
52
![Page 53: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/53.jpg)
ClusterStor Manager (cont…)
53
![Page 54: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/54.jpg)
ClusterStor Manager (cont…)
54
![Page 55: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/55.jpg)
ClusterStor Manager (cont…)
55
![Page 56: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/56.jpg)
ClusterStor Manager (cont…)
56
![Page 57: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/57.jpg)
SCRIPTS TO MONITOR LUSTRE
57
![Page 58: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/58.jpg)
Sun - Compute Nodes
5858
![Page 59: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/59.jpg)
GANGLIA MONITORING TOOLGanglia is a graphical-user interface tool used to monitor components such as memory, processor and network of the system. In the sun cluster, it is used to monitor the following computational architectures:
Nehalem
Harpertown
Dell
Westmere
It can further be able to provide the status of the individual execution nodes within the supercomputing system.
59
![Page 60: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/60.jpg)
GANGLIA MONITORING TOOL (cont..)
60
![Page 61: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/61.jpg)
GANGLIA MONITORING TOOL (cont..)
61
![Page 62: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/62.jpg)
GANGLIA MONITORING TOOL (cont..)
62
![Page 63: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/63.jpg)
xCAT (Extreme Cloud Administration Toolkit )xCAT is the software tool used to enable easy administration and deployment of Linux machines in a distributed parallel environment. The CHPC administrators use xCAT to perform administration tasks such as:
Provisioning of the node (s)
Power on/off the node (s)
Query status of the node (s)
Add post-scripts for the nodes
Distribution of user accounts details in the nodes
63
![Page 64: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/64.jpg)
Sun - Security
6464
![Page 65: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/65.jpg)
CLUSTER SECURITYThe security is one of the most important critical and challenging factors of administrating the supercomputing system. The CHPC - Sun system protect itself from virus and intruders by applying the following security measures:
Normal user only allowed to attempt login for 10 times
Root user only allowed to attempt login for 2 times
Users not allowed to ssh into the compute nodes
Full-time availability, reliability and accessibility of the system depend on the security measures taken to ensure no virus, intruders and hackers has unlawful access to the systems.
65
![Page 66: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/66.jpg)
SCRIPTS TO MONITOR SECURITY
66
![Page 67: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/67.jpg)
SUN - NETWORK
6767
![Page 68: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/68.jpg)
STRUCTURE OF THE NETWORK
68
![Page 69: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/69.jpg)
QUESTIONS
6969
![Page 70: M.S MABAKANE HPC USERS AND ADMINISTRATORS WORKSHOP](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649de45503460f94adbc9a/html5/thumbnails/70.jpg)
ACKNOWLEDGEMENTSI would like to acknowledge the following for their contributions on this presentation:
CHPC - Technical team & Langton Nkiwane (Eclipse Holdings) for their advises about the supercomputing infrastructures.
Brenda Skosana (CSIR) & Zoleka Nkukuma (CSIR) for their assistance in logistics of all the attendees.
Wilson Maruping (CHPC) for tireless effort to provide helpdesk statistics.
70