parallel algorithms - design,performance &analysis issues algorithms.pdf · •a parallel...

54
Parallel Algorithms - Design,Performance &Analysis Issues Dr. Seema Bawa Professor & Head, Dept. of Computer Sc. & Engineering Head, Centre for Information Super Highway Thapar Uniersity, Patiala [email protected]

Upload: others

Post on 19-Jul-2020

16 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Parallel Algorithms -

Design,Performance &Analysis

Issues

Dr. Seema BawaProfessor & Head, Dept. of Computer Sc. & Engineering

Head, Centre for Information Super Highway

Thapar Uniersity, Patiala

[email protected]

Page 2: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Parallel Processing

• Goal:

– speedup computationally intensive tasks

• Method:

– multiple processes execute sub-tasks

• Requirement:

– sharing information among the processes

Page 3: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Why Parallel Processing?

Computation requirements are ever increasing-visualization, distributed databases, simulations, scientific prediction (earthquake), etc.

Sequential architectures reaching physical limitations (e.g., speed of light, thermodynamics)

Page 4: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Architecture

• Single processor:

– single instruction stream

– single data stream

– von Neumann model

• Multiple processors:

– Flynn’s taxonomy

Page 5: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

MISD

SISD

MIMD

SIMD

1 Many

1M

any

Data Streams

Inst

ruct

ion

Str

eam

s

Flynn’s Taxonomy

Page 6: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Single Instruction, Single Data

(SISD)

• Typical Sequential machine

• Can execute one instruction at a time on

one piece of data

• Disregards Pipelining and vector

processing

Page 7: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

SISD : A Conventional Computer

Speed is limited by the rate at which computer can

transfer information internally.

ProcessorData Input Data Output

Instru

ction

s

Ex:PC, Macintosh, Workstations

Page 8: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

The MISD Architecture

More of an intellectual exercise than a practical

configuration. Few built, but commercially not available

Data

Input

Stream

Data

Output

Stream

Processor

A

Processor

B

Processor

C

Instruction

Stream A

Instruction

Stream B

Instruction Stream C

Page 9: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

SIMD Architecture

Ex: CRAY machine vector processing, Thinking machine cm*

Intel MMX (multimedia support)

Ci<= Ai * Bi

Instruction

Stream

Processor

A

Processor

B

Processor

C

Data Input

stream A

Data Input

stream B

Data Input

stream C

Data Output

stream A

Data Output

stream B

Data Output

stream C

Page 10: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Single Instruction, Multiple Data

(SIMD)• Array Processor

– Multiple Memory Units

– Interconnection Network

– One control unit that broadcasts instructions to all processors which execute on their own memory

• Vector Processor– One Memory

– One Processor

– Specialized Functional Unit that executes one instruction on pipeline of data

Page 11: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Array Processor

Control Unit

P P P P

M M M M

Interconnection Network

Page 12: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Vector Processor

M M M M

Interconnection Network

PFunctional Unit

Page 13: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Unlike SISD, MISD, MIMD computer works asynchronously.

Shared memory (tightly coupled) MIMD

Distributed memory (loosely coupled) MIMD

MIMD Architecture

Processor

A

Processor

B

Processor

C

Data Input

stream A

Data Input

stream B

Data Input

stream C

Data Output

stream A

Data Output

stream B

Data Output

stream C

Instruction

Stream AInstruction

Stream B

Instruction

Stream C

Page 14: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Multiple Instruction, Multiple

Data (MIMD)• Complete & Independent Processors

• Shared Memory

– One memory accessible to all processors

– Instructions can execute on any memory location

• Distributed Memory

– Memory distributed with processors

– Processors execute instructions on their data

– Data passed among processors with messages

Page 15: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

M

E

M

O

R

Y

B

U

S

Shared Memory MIMD machine

Comm: Source PE writes data to GM & destination retrieves it

Easy to build, conventional OSes of SISD can be easily be ported

Limitation : reliability & expandibility. A memory component or any

processor failure affects the whole system.

Increase of processors leads to memory contention.

Ex. : Silicon graphics supercomputers....

M

E

M

O

R

Y

B

U

S

Global Memory System

Processor

A

Processor

B

Processor

C

M

E

M

O

R

Y

B

U

S

Page 16: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

M

E

M

O

R

Y

B

U

S

Distributed Memory MIMD

Communication : IPC (Inter-Process Communication) via High SpeedNetwork.

Network can be configured to ... Tree, Mesh, Cube, etc.

Unlike Shared MIMD

easily/ readily expandable

Highly reliable (any CPU failure does not affect the whole system)

Processor

A

Processor

B

Processor

C

M

E

M

O

R

Y

B

U

S

M

E

M

O

R

Y

B

U

S

Memory

System AMemory

System B

Memory

System C

IPC

channel

IPC

channel

Page 17: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Parallel vs. Distributed

Computing

• Parallel:

– several processing elements concurrently solving

a single problem

• Distributed:

– processing elements do not share memory or

system clock

Page 18: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Parallelization

• Functional & Data parallelism :

– functional: different operations on same

data elements

– data: same operations on different data

elements

Page 19: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Principles of Parallel Algorithm Design

• A sequential algorithm is essentially a recipe or a

sequence of basic steps for solving a given problem.

• A parallel algorithm is a recipe that tells us how to

solve a given problem using multiple processors.

• A parallel algorithm has the added dimension of

concurrency and the algorithm designer must specify

sets of steps that can be executed in parallel.

Page 20: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Specifying a Parallel Algorithm includes:

• Identifying portions of the work that can be performed concurrently.

Mapping the concurrent pieces of work onto multiple processes running in parallel.

• Distributing the input, output, and intermediate data associated with the program.

• Managing accesses to data shared by multiple processors.

Page 21: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Specifying a Parallel Algorithm includes:

• Synchronizing the processors at various

stages of the parallel program execution.

Page 22: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Parallel Algorithm Design

Two key steps in the design of parallel

algorithms :

Dividing a computation into smaller

computations

Assigning them to different processors

Page 23: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Decomposition & Tasks• Decomposition : The processes of

dividing a computation into smaller parts, some or all of which may potentially be executed in parallel.

• Tasks :are programmer-defined units of computation into which the main computation is subdivided by means of decomposition

Page 24: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Decomposition & Tasks

(Contd..)

Example :- Matrix multiplication

Page 25: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Decomposition & Tasks

(Contd..)

Page 26: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Task Dependency Graphs

Example:Matrix Multiplication

Page 27: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Example: Database Query

Processing

QUERY

MODEL=”Accord” AND YEAR=”1999”

(COLOR=”Green”ORCOLOR=”Black”)

Page 28: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Example: Database Query

Processing

Page 29: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Granularity & Concurrency

• The number and size of tasks into which

a problem is decomposed determines

the granularity of the decomposition.

TYPES:

o FINE GRAINED

o COARSE GRAINED

Page 30: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Processes and Mapping

• Process is an abstract entity that uses the

code and data corresponding to a task to

produce the output of that task within a finite

amount of time after the task is activated by

the parallel program

• The mechanism by which tasks are assigned

to processes for execution is called mapping.

Page 31: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Decomposition Techniques

Recursive decomposition

Data decomposition

Exploratory decomposition

Speculative decomposition

Page 32: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Characteristics of Tasks

Task Generation

Task Sizes

Size of Data Associated with Tasks

Page 33: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Characteristics of Inter-task

Interactions

• STATIC

• DYNAMIC

Page 34: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Mapping Techniques for Load

Balancing

• STATIC

• DYNAMIC

Page 35: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Performance Metrics for

Parallel Systems

Execution Time

The parallel run time is the time that

elapses from the moment that a parallel

computation starts to the moment that

the last processor finishes execution.

Tp = Ts / p.

Page 36: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Total Parallel Overhead

Total overhead or overhead function of a

parallel system as the total time collectively

spent by all the processors over and above that

required by the fastest known sequential

algorithm for solving the same problem on a

single processor.

Page 37: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Speedup

It is defined as the ratio of the time

taken to solve a problem on a single

processor to the time required to solve

the same problem on a parallel computer

with p identical processors.

Page 38: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Efficiency

It is defined as the ratio of speedup to the

number

of processors. In an ideal parallel system,

speedup is equal to p and efficiency is equal

to one.

Page 39: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Efficiency

• A parallel algorithm is efficient iff– it is fast (e.g. polynomial time) and

– the product of the parallel time and number of processors is close to the time of at the best know sequential algorithm

T sequential T parallel N processors

• A parallel algorithms is optimal iff this product is of the same order as the best known sequential time

Page 40: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

COST

The COST of solving a problem on a

parallel system as the product of parallel run

time and the number of processors used.

A parallel system is said to be cost-optimal if the cost of

solving a problem on a parallel computer is proportional

to the execution time of the fastest-known sequential

algorithm on a single processor

Page 41: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

EXAMPLE ILLUSTRATING

THE PERFORMANCE

METRICS

Page 42: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

EXAMPLE ILLUSTRATING

THE PERFORMANCE

METRICS

Page 43: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

EXAMPLE ILLUSTRATING

THE PERFORMANCE

METRICS

Page 44: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Scalability of Parallel Systems

The scalability of a parallel system is a

measure of its capacity to increase

speedup

in proportion to the number of

processors. It reflects a parallel system’s

ability to utilize increasing processing

resources effectively

Page 45: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Parallel Algorithm Analysis

Computation complexity Tcomp(n,p)

Communication complexity Tcomm(n,p)

Page 46: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

ALGORITHM ANALYSIS

TECNIQUES

The essence of parallel programming lies in being able to partition the task gracefully.

DIVIDE AND CONQUER

Steps

1. Divide an instance of a problem into one or more smaller instances.

2. Conquer each of the smaller instances.Unless a smaller instance is sufficiently small,use recursion to do this.

3. If necessary, combine the solutions to thesmaller instances to obtain the solution to theoriginal instance.

Page 47: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

DYNAMIC PROGRAMMING:

STEPS:-

1. Establish a recursive property that givesthe Solution to an instance of theproblem.

2.Solve an instance of the problem in abottom-up fashion by solving smallerinstances first.

Page 48: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

EXAMPLE ILLUSTRATING

THE DIFFERENCE

BINOMIAL COEFFICIENT

The binomial coefficient is given by:

[n] = { [n-1] + [n-1] }

[k] { [k-1]+ [k] } 0<k<n

Page 49: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Algorithm ----Binomial Coefficient

using Divide and Conquer

• function bin(n, k : integer):integer;

• begin

• if k=0 or n=k then

• bin:=1

• else

• bin:=bin(n-1,k-1)+bin(n-1,k)

• end

• end;

Page 50: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

Algorithm --Dynamic Programming

• function bin2(n,k :integer):integer;

• var

• I,j:index;

• B:array[0..n,0..k] of integer;

• Begin

• For i:= 0 to n do

• For j:=0 to minimum(i,k) do

• If j=0 or j=I then

• B[I,j]:=i

• Else

• B[I,j]:= b[i-1,j-1] + b[i-1,j]

• End

• End;

Page 51: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

THE GREEDY APPROACH

STEPS:

A selection procedure chooses the nextitem to add to the set. The selection isperformed according to the greedy criterionthat satisfies some locally optimalconsideration at the time.

A feasibility check determines if the newset is feasible by checking whether it ispossible to complete this set in such a wayas to give a solution to the instance.

Page 52: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

THE GREEDY APPROACH

• A solution check determines whether the new set constitutes a solution to the instance.

• A greedy algorithm works in phases: At each phase:

– You take the best you can get right now, without regard for future consequences

– You hope that by choosing a local optimumat each step, you will end up at a globaloptimum

Page 53: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

BACKTRACKING

• Backtracking is used to solve problems

in which a sequence of objects is

chosen from a specified set so that the

sequence satisfies some criteria

• EXAMPLE: n-Queens Problem

Page 54: Parallel Algorithms - Design,Performance &Analysis Issues Algorithms.pdf · •A parallel algorithms is optimal iff this product is of the same order as the best known sequential

BACKTRACKING

STEPS:-

– Tests to see if a solution has been found,

and if so, returns it; otherwise

– For each choice that can be made at this

point,

• Make that choice

• Recur

• If the recursion returns a solution, return it

– If no choices remain, return failure