parallel processing experience on low cost pentium machines · parallel computing parallel...

Parallel Processing Experience

on Low cost Pentium Machines

By

Syed Misbahuddin

Computer Engineering Department Sir Syed University of Engineering and

Technology, Karachi [email protected]

Presented in Open Source Series Workshop 2010

22-24 December, 2010 © ICOSST 2010

Presentation Outline

• Introduction to Parallel Processing concepts

• Hardware and Software Architecture of PC

cluster

• Demonstration of some simple applications

on PC Cluster

• Remote Access to PC Cluster via Internet

2



Parallel Computing

Parallel computing is the simultaneous use of multiple

compute resources to solve a computational problem

Parallel Programming is used to improve computational

speed for a given problem

In Parallel Computing, a problem is broken into

discrete parts that can be solved concurrently

3



Some Areas of Parallel Computing

• Weather forecasting

• Numerical simulation of engineering and

scientific problems

• Economic scenario analysis

• Data mining applications

• Modeling large DNA structures

• Speech recognition

• Image/Video Processing 4



Application Examples for Parallel Processing

• An oil exploration project can divide the sea-

floor into various areas. Nodes in a cluster can

process the seismic data from these areas in

parallel.

• An image can be broken into parts and given

to cluster nodes for image recognition

• DNA sequence search algorithms can be

mapped on cluster

5



Problem Decomposition for Parallel processing

6



Computing platforms for Parallel

Computing

• A single computer with multiple Internal

Processors

– Shared Memory multiprocessor model

• Multiple Interconnected computers

– Message passing Multi computer model

7



Shared Memory Multiprocessor Model

• Shared Memory Multiprocessor model is attractive

due to data access convenience

• Hardware implementation for fast memory access is

difficult

8



Shared Memory Model

9



Message passing Distributed Memory

Multi- computer model or Cluster

10



Programming Models for Parallel

Computers

• OpenMP for shared memory programming

• MPI (Message Passing Interface) for

distributed memory programming

11



Advantage of Message Passing Cluster

Model

• Off-the-shelf normal Intel PCs can be configured to

form a Message Passing Multi computer model

• No special mechanism is needed for controlling

simultaneous data access like shared memory

multiprocessor model

• Clusters are very popular in universities and research

labs for teaching and experimenting Parallel

Processing concepts

12



PC cluster as Distributed Memory

Model

Sequential Applications

Parallel Applications

Parallel Programming Environment

Cluster Middleware

Cluster Interconnection Network/Switch

PC/Workstation

Network Interface

Hardware

Communications

Software

PC/Workstation

Network Interface

Hardware

Communications

Software

PC/Workstation

Network Interface

Hardware

Communications

Software

PC/Workstation

Network Interface

Hardware

Communications

Software

Sequential Applications Sequential Applications

Parallel Applications Parallel Applications

13



Hardware Architecture for PC

Cluster

Collection of Homogeneous or Heterogeneous

Pentium machines with Network connectivity

14



Steps to Build PC Cluster

• LINUX Operating System installations on all nodes to be included in cluster

• NFS/NIS Server Configuration

• NFS/NIS Clients Configurations

• Installation of Parallel Programming System (PPS)

• Commonly used PPS are:

– PVM from University of Tennessee

– MPI from MPI forum

– BSP from Oxford University

15



Linux Cluster Practical Example

Intranet

Remote Web Clients



17



Message Passing Interface (MPI)

MPI is a library of routines introduced by the Message-Passing Interface Forum in 1994

MPI is used for Parallel Processing on Cluster

MPI allows transmitting data from one process running on one machine to another process running on a different machine

The programmer is responsible for correctly identifying parallelism and implementing the resulting algorithm using MPI constructs

MPI subprograms that can be invoked by C/C++ or Fortran 77 program

18



Six Main MPI routines

1. MPI_INIT: initializes the MPI environment

2. MPI_COMM_SIZE: returns the number of processes

3. MPI_COMM_RANK: returns this process's number (rank)

4. MPI_SEND: sends a message

5. MPI_RECV: receives a message

6. MPI_FINALIZE

19



MPI_COMM_WORLD

20

•The MPI communicator specifies a group of processes inside which a communication occurs

• In A MPI Communicator world, each process gets a RANK_ID

• MPI Communicator world is created over all machines in the system



21



22

Process control over the Cluster

Rank_ID=0

Rank_ID=1 Rank_ID=2



23

A Parallel Computation Example

Problem:

F=(a-b)(a+b)(a*b)

Suboperation1=(a-b) done by Machine1 and send partial result to

Suboperation2=(a+b) done by Machine2

Suboperation3 =(a*b) done by Machine3

Each machine sends partial results to the head node or server

Server computes F= (a-b)*(a+b)*(a*b)

23



24

Server reads a,b

anc d

Tmp1=(a-b)

Server Node

Client Node1

Tmp2=(a+b) Tmp3=(a*b)

Client Node2 Client Node3

Server computes

(a-b)(a+b)(a*b)

24



MPI PROGRAM

25



Popular MPI Implementation

• MPICH by Argonne National Lab and

Mississippi State University

• Local Area Multicomputer MPI(LAM-MPI)

by Ohio Supercomputer center

• CHIMP by Edinburgh Parallel Computing

Center

26



LAM/MPI

• LAM/MPI is a high-quality open-source implementation of the Message Passing Interface specification

• LAM allows interconnected computers to act as one parallel computer for solving one compute intensive problem

• LAM features a full implementation of the MPI communication standard

27



Launching LAM on The Linux cluster

lamboot command is used to launch LAM daemons on a Linux cluster

LAM daemon running on each node provides process management, including signal handling and I/O management.

A textfile contains the names of machines on LAM is to be launched

$lamboot –v machinefile

LAM daemons are owned by the users

There can be several parallel machines on same clusters

28



Running Programs on Cluster

• A special compiler called Handle-C compiler

(hcc) is used to compile C codes for cluster

• $hcc –o exec_file mpi_prog.c

• mpirun -np 3 exec_file

29



Sample MPI Programs

• Example 1

Objective:

1. Two client nodes send Greeting messages to

the server node

2. Server node receives Greeting message and

prints on display

30



Code for Progarm1 1. MPI_Init(&argc, &argv);

2. MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

3. MPI_Comm_size(MPI_COMM_WORLD, &p);

4. gherr = gethostname( hname, silen);

5. if (my_rank != 0) {

6. /* Client code */

7. sprintf(message, "Greetings from process %d on %s!", my_rank, hname);

8. dest = 0;

9. MPI_Send(message, strlen (message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);

10. } else {

11. /* Server code */

12. printf ("Messages received by process %d on %s.\n\n", my_rank, hname);

13. for (source = 1; source < p; source++) {

14. MPI_Recv(message, 800, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status);

15. printf("%s\n", message);} 31



Output of Program1 on Cluster

32



Example 2

• Objective: To find sum of an Integer array

• Server node reads Integer array and sends it to clients

• Client 1 finds partial sum of first half array elements and sends partial sum to server

• Client 2 finds partial sum of second half array elements and sends partial sum to server

• Sever node adds partial sums to find total sum of

input array

33



Output of Program2 on Cluster

34



Example 3: Matrix Multiplication on

Cluster

35



Example 4

Numerical Integration on Cluster

• Each nodes computes the area of a slice of

the curve and send the individual area to the

server

• Serve computes the area of the curve by

adding the individual areas.

36



• Questions ?

37



parallel processing experience on low cost pentium machines · parallel computing parallel...

Documents