solutions king’s college london · solutions king’s college london this paper is part of an...

25
King’s College London

Upload: others

Post on 15-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

King’s College LondonThis paper is part of an examination of the College counting towards the award

of a degree. Examinations are governed by the College Regulations under the

authority of the Academic Board.

Degree Programmes BSc, MSci

Module Code 6CCS3PAL: IYA

Module Title Parallel Algorithms

Examination Period Example exam 2017

Time Allowed Two hours

Rubric Answer Three of the Five questions.

All questions carry equal marks. If more than three questions

are answered, clearly indicate which answers you would like

to be marked. Write this clearly in the dedicated section on

the front page of the answer booklet.

Answer each question on a new page of your

answer book and write its number in the space

provided.

Calculators Calculators are not permitted

Notes Books, notes or other written material may not be brought

into this examination

Answer each question on a new page of your answer book

and write its number in the space provided

DO NOT REMOVE THIS PAPER FROM THE EXAMINATION

ROOM

2017 King's College London

Page 2: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

1. a. Compare and contrast the notions of parallel and distributed algorithms.

To what extent do they di�er, to what extent are they the same?

[10 marks]

Answer

[marks 5+5]

Open ended. Chance to see what students think, understand.

Parallel (PRAM) totally authoritarian, every body does same thing,

refers to shared memory under central control.

Parallel (MESH) still under central control, but local memory and con-

nections.

Distributed anarchic, everybody does own thing, no central knowledge

of network structure etc.

b. De�ne speedup, cost and e�ciency measures for parallel algorithms. To

what extent are these measures meaningful?

[8 marks]

Answer

De�ne speedup, cost and e�ciency measures for a parallel algorithm

using p processors on a problem instance of size n. T ∗(n) worst case

run time of optimal sequential algorithm Speedup T ∗(n)/Tp(n). Cost

pTp(n). E�ciency T ∗(n)/(pTp(n))

To what extent are these measures meaningful? Problem is, often dont

know T ∗(n).

Page 2 SEE NEXT PAGE

Page 3: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

c. Give a description of the following topics. Your answer should include

an outline of any appropriate algorithms.

i. Parallel and distributed approaches to broadcasting and connectivity

testing in networks.[18 marks]

ii. Parallel computation of a minimum spanning tree and the connected

components of an undirected graph.[14 marks]

Answer

The topics are standard examples which were covered in detail in the

course Part i) [marks 9+9]

Part ii) [marks 7+7]

Page 3 SEE NEXT PAGE

Page 4: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

2. a. Explain the Parallel Random Access Machine (PRAM) model with the

help of a diagram. What are the di�erent assumptions that can be made

about reading and writing in this model?

[6 marks]

Answer

Diagram is sketch of PRAM network (not given here)

PRAM: Synchronous shared-memory model,

PRAM algorithms are of type single instruction multiple data (SIMD)

type. 4 Types:

EREW - Exclusive Read Exclusive Write

CREW - Concurrent Read Exclusive Write

CRCW - Concurrent Read Concurrent Write

ERCW - also possible

Page 4 SEE NEXT PAGE

Page 5: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

b. The following sub-questions assume the EREW-PRAM model.

i. State in pseudocode a parallel algorithm to initialise an array A[1...n]

with a value Z on a EREW-PRAM by broadcasting.[6 marks]

ii. Use this algorithm to initialise an array of length n = 8 with the item

X, showing the steps in the execution of the algorithm.[6 marks]

iii. Explain how to use this algorithm to test if an item X is contained

in an array B[1...n] and to return the lowest index if any, of the

location of X in the array. Give an analysis of the running time of

the algorithm.[6 marks]

iv. Assuming that the array A[1...8] is initialized with the item X as

in part (??) above, show the steps in returning the lowest index

containing this item in the array B = [H,A,X,E, L,A,X,X].[6 marks]

Answer

(1) By broadcasting on an EREW PRAM we mean to copy an item X

that is stored in an speci�ed location of the global memory of an EREW

PRAM to all local memories.

The running time is O(log n) due to �rst and third stages.

Let p = 2k. Final array index is p-1.

BroadcastPRAM(A[0..p-1], X) {

A[0] = X;

for (int i = 1; i <= k; i++) {

for 2**(i-1) <= j <= 2**i - 1 pardo {

A[j] = A[j - 2**(i-1)];

}

}

}

Page 5 SEE NEXT PAGE

Page 6: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

(2) Use broadcasting, start withX in Temp[1], double the index range at

each step. In this way X is broadcast to all entries in Temp. A diagram

will do.

(3) Test membership

EREW--IS-IN(X,L) begin

For i=1,...n do in parallel

begin

If L[i]=Temp[i] then

Temp[i]:= i

else

Temp[i]:= INFINITY

end

end

Find lowest index? Use binary-fan-in. Processor i writes min(L[2i], L[2i+

1]) to L[i]. A diagram will do.

(4) Let M denote in�nity

[ H, A, X, E, L, A, X, X]

[ M, M, 3, M, M, M, 7, 8] Initialize membership

[M, 3, M, 7] Pairwise min

[3 ,7] Pairwise min

[3] Pairwise min

Ans: That min array entry containing X is 3

Page 6 SEE NEXT PAGE

Page 7: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

c. Give an algorithm for the distributed construction of a breadth �rst

search spanning tree T in a connected graph G.

[10 marks]

Illustrate the construction of the tree T starting from the root vertex 1

in the graph G with vertex set {1, 2, 3, 4, 5, 6, 7, 8} and edges

{1, 2}, {1, 6}, {2, 3}, {2, 5}, {5, 8}, {6, 5}, {3, 4}, {3, 5}, {4, 8}, {8, 7}, {7, 6}.

[10 marks]

Answer

The algorithm used in the lectures:

Initially all vertices have status 'not in tree'

In round 1 the root changes its status to 'in tree' and send message 'join

tree' to all its neighbours

If an unmarked node receives the message 'join tree' it changes its status

to 'in tree' and send the message 'your child' to the lowest labeled of

the sending nodes, and 'not your child' to the others.

In the next round it send 'join tree' to unmarked neighbours

If a node has no children (ie didnt receive 'your child' from any vertex

in the current round) then it sends 'terminated' to its parent.

When a node receives 'terminated' from all its children, it send termi-

nated to its parent

The process �nishes when the root receives terminated from all its neigh-

bours

The diagram is shown in �le IYA-2018-�gures.pdf

Page 7 SEE NEXT PAGE

Page 8: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

3. a. i. Explain the list ranking problem. Give a CREW-PRAM parallel al-

gorithm for the list ranking problem. Establish the runtime of your

algorithm on an n element list L, using n processors.[12 marks]

ii. Demonstrate how your algorithm works on a 4 element list.[8 marks]

Page 8 SEE NEXT PAGE

Page 9: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

Answer

[marks for part i) 6+6]

The list-ranking problem is to determine the distance of each node

i from the end of the list.

Algorithm: LIST-RANK

Uses Pointer jumping technique:

Input: A linked list L of size n, or an array S simulating the order of

L.

Output: Distance dist(i), for 1 ≤ i ≤ n of node i from the end.

LIST-RANK(L) BEGIN

%n list length

%next(k) list pointer to next item

%initialize distance

For all k \in L in parallel do

begin

P(k):= next(k)

If P(k) =/= k then dist(k):=1

else dist(k):=0

end

Repeat log_2 n times

For all k \in L in parallel do

begin

If P(k) =/= P(P(k)) then

begin

dist(k):=dist(k)+dist(P(k))

P(k) := P(P(k))

end

Page 9 SEE NEXT PAGE

Page 10: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

end

For all k \in L in parallel do

Rank(k):= dist(k)

END

Demonstration of algorithm: log24 = 2 so 2 alg. has rounds.

% initialize

k 1 2 3 4

P(k):=next(k) 2 3 4 4

dist 1 1 1 0

PP(k) 3 4 4 4

% round 1

P(k) =P(P(k))? N N Y Y

dist 2 2 1 0

P(k) update 3 4 4 4

PP(k) 4 4 4 4

%round 2

P(k) =P(P(k))? N Y Y Y

dist 3 2 1 0

% final rank

Page 10 SEE NEXT PAGE

Page 11: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

b. i. Explain what is meant by an associative binary operator, giving ex-

amples. Is the operator max(a, b), which calculates the maximum of

two numbers a, b, an associative binary operator? If so explain why,

giving an example.[5 marks]

ii. An array A contains data entries [x1, x2, . . . , xn]. De�ne the term

pre�x sums of A with respect to an associative binary operator ◦.Let A = [5, 3, 8, 4, 0]. State the pre�x sums of A using the operator

max.[5 marks]

Answer

1. associative binary operator ◦, is binary (C := A ◦ B and satis�es

(A ◦B) ◦ C = A ◦ (B ◦ C)

Max is associative, the max of 3 numbers is the max whatever its

order of evaluation.

2. Let ⊕ be a binary operator.

The pre�x sums of a sequence {x1, x2, · · · , xn} are the n partial

sums de�ned by

si = x1 ⊕ x2 ⊕ · · · ⊕ xi, 1 ≤ i ≤ n.

A=[ 5, 3, 8, 4, 0] max 5 5 8 8 8

Page 11 SEE NEXT PAGE

Page 12: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

c. i. Suppose the binary operator is addition of integers (+). Describe a

CREW-PRAM parallel algorithm to compute the additive pre�x sums

of n array values held at locations A(n), ..., A(2n − 1) of an array

A[1 : 2n− 1]. Justify the running time complexity of your answer.[10 marks]

ii. Use a diagram to demonstrate the execution of this algorithm when

computing the pre�x sums of the following array: [6, 3, 2, 4, 5, 4, 3, 2].

Give all steps and justify intermediate results of the computation.[10 marks]

Page 12 SEE NEXT PAGE

Page 13: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

Answer

Algorithm Prefix Computation

Input: an array A[1:2n-1] with data at A(n),...,A(2n-1)

Output: an array B[1:2n-1] with prefix sums at B(n),...,B(2n-1)

Assumption n=2^m (m=log_2 n)

For k= (m-1) step (-1) to 0 do begin

For all j, 2^k \le j \le 2^{k+1}-1 in parallel do

A(j):= A(2j)+A(2j+1)

% compress on even indices

end

B(1):=A(1)

% insert total value of array at root of binary tree

% now come back down the tree

For k= 1 step 1 to (m-1) do begin

For all j, 2^k \le j \le 2^{k+1}-1 in parallel do

If j is odd then

B(j):= B((j-1)/2)

If j is even then

B(j):= B(j/2))-A(j+1)

end

end % of algorithm

Runs in O(log n) time and performs O(n) operations. The algorithm is

upward pass from leaves to root of balanced binary tree of height log2 n

followed by downward pass to leaves. Therefore, we have a O(log n)

parallel running time.

The diagram is shown in �le IYA-2018-�gures.pdf

Page 13 SEE NEXT PAGE

Page 14: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

4. a. Explain the Distributed Memory Interconnection Network model with

the help of a diagram. Outline typical network structures that might

arise.

[10 marks]

Answer

Standard model (see below)+ Scope for creativity here with the net-

works.

Page 14 SEE NEXT PAGE

Page 15: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

b. Describe three basic goodness measures for an interconnection network,

and explain why they are considered important.

[10 marks]

Illustrate your answer for the following networks:

i. Complete graph K(p) of p processors.[5 marks]

ii. Two-dimensional mesh M(p, p) of p× p processors.[5 marks]

Answer

Diameter, maximum degree, bisection width (smallest edge cut between

two sets of the same size).

Importance:

Diameter: Determines shortest route between processors (information

exchange)

Maximum degree: Increase max degree, more processors can communi-

cate directly

Bisection width: Measure of comms. bottleneck (unfortunately not so

easy to calculate)

Complete graph K(p), Diameter 1, max degree p− 1, Bisection width:

∼ p/2 (excellent)

Two dimensional mesh M(p, p). Diameter Θ(p), max deg. 4, Bisection

width Θ(p) (not too bad, but not brilliant, especially diameter)

Page 15 SEE NEXT PAGE

Page 16: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

c. Suppose the values of an array L of length n2 are distributed over

a square two-dimensional mesh interconnection network consisting of

p = n2 processors, in row major order. It is required to determine the

minimum processor index i (if any) for which L(i) is a given item X.

Describe a parallel algorithm to search for X on the mesh, illustrating

your answer using the 3× 3 mesh given below, with X = f . You may

assume all processors already know the value of X.

[c]---[g]---[f]

| | |

[a]---[t]---[f]

| | |

[c]---[f]---[c]

[20 marks]

Answer

We assume L is held in row major order on the mesh. Thus L(k) is held

at location (i, j) where k = ip + j for i, j = 0, ..., p− 1. Initially each

processor initializes variable Index as follows:

Index:= INFINITY

If L(i,j)=X then

Index:= i*p+j

The processors then execute MIN-2D-MESH(Index, p) to pass the min-

imum index location to processor (0, 0).

Notation on variable passing.

P (i, j).T emp means value of Temp at processor P (i, j)

MIN-2D-MESH(Y, p)

%Input Y=[Y(1),...,Y(n)], n=p*p

% List of items

%Output Min(Y)

Page 16 SEE NEXT PAGE

Page 17: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

% compute column minimums

% compare rows (Row, Row+1) pairwise starting from bottom

% at end first row contains column minimum

For Row = p-1 down to 0 do

For i=Row and 0 \le j \le p-1 in parallel do

begin

P(i,j).Temp := P(i+1,j).X

X:= Min(X,Temp) % at processor (i,j)

end

% compute minimum of first row sequentially

% working backwards from last column pairwise

For Column = p-1 down to 0 do

begin

P(0,j).Temp := P(0,j+1).X

X:= Min(X,Temp) % at processor (0,j)

end

Return P(0,0).X

Let M= INFINITY

[c]---[g]---[f]

| | |

[a]---[t]---[f]

| | |

[c]---[f]---[c]

Page 17 SEE NEXT PAGE

Page 18: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

[M]---[M]---[3]

| | |

[M]---[M]---[6]

| | |

[M]---[8]---[M]

[M]---[M]---[3]

| | |

[M]---[8]---[6]

| | |

[M]---[8]---[M]

[M]---[8]---[3]

| | |

[M]---[8]---[6]

| | |

[M]---[8]---[M]

First row only

[M]---[8]---[3]

[M]---[3]---[3]

[3]---[3]---[3]

Ans Min index = 3

Page 18 SEE NEXT PAGE

Page 19: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

5. a. i. Explain the divide and conquer approach to the design of algorithms.

[4 marks]

ii. State an algorithm for sequential MergeSort of an array A[1...n]

which acts by divide and conquer.[5 marks]

iii. Use the array A = [6, 8, 7, 1, 4, 3] to illustrate the operation of this

algorithm.[3 marks]

iv. Explain how to produce a parallel version of MergeSort, to sort an

array A[1...n] in the multi-threading environment. What is the span

of this algorithm?[8 marks]

Page 19 SEE NEXT PAGE

Page 20: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

Answer

The purpose of this question is to examine the following

topic(s): Parallel sorting algorithmsThis question addresses parallel divide an conquer strategies,

and parallel sorting

(i)Divide and conquer strategy

1. Divide input into partitions of almost equal size

2. Recursively solve the subproblems de�ned by the partition

3. Combine the solutions to the subproblems into a single answer and

pass it back as the answer to the recursion call

It must be possible to perform steps 1,3 e�ciently for the D and C

approach to work well

(ii) Sequential MergeSort

Seql-MergeSort(L, p, r)

begin

If p < r then

q = b(p + r)/2cSeql-MergeSort(L, p, q)

Seql-MergeSort(L, q + 1, r)

Seql-Merge(L, p, q, r)

end

(iii)

produce recursion tree

[6 8 7 1 4 3]

[6 8 7] [1 4 3]

[6 8] [7] [1 4] [3]

[6 ] [8] [7] [1] [4] [3]

start merging

Page 20 SEE NEXT PAGE

Page 21: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

[6 8] [ 7] [1 4] [3]

[6 7 8] [1 3 4]

[1 3 4 6 7 8]

Sorted

(iv)

Par-MergeSort(L, p, r)

begin

If p < r then

q = b(p + r)/2cSpawn Par-MergeSort(L, p, q)

Par-MergeSort(L, q + 1, r)

Sync

Seql-Merge(L, p, q, r)

end

No appreciable speed up as cost is in Sequential Merge. Span is Θ(n)

Page 21 SEE NEXT PAGE

Page 22: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

b. i. State a sequential binary search algorithmBinary-Search(x,A, p, r)

to return the index of the �rst occurrence of x in the sorted array

A[p..r]. Assume that, if x < A[p], the index p is returned; similarly,if

x > A[r], the index r is returned.[ 10 marks]

ii. Illustrate the execution of algorithmBinary-Search(x,A, 1, 13) when

used to locate the �rst occurrence of x = 6 in the sorted array

A = [1, 3, 5, 6, 6, 7, 9, 14, 21, 30, 74, 75, 87]

[ 12 marks]

iii. Explain how to use sequential binary search to improve the running

time of parallel Merge-Sort, illustrating your explanation with a suit-

able diagram. What is the span of the improved algorithm in the

multi-threading model?[ 8 marks]

Page 22 SEE NEXT PAGE

Page 23: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

Answer

(i) The following Binary Search returns the �rst occurrence of x ∈ T

Binary-Search (x, T, p, r)

if p = r return p (end of search)

else (carry on searching)

begin

Low L = p

High H = r

while L < H

Middle value q = b(L + H)/2cif x ≤ T [q]

(set High to Mid to search bottom half of T )

H = q

else L = q + 1 (search top half)

return High H

end

(ii) Binary-Search(6, A, 1, 13)

A = [1, 3, 5, 6, 6, 7, 9, 14, 21, 30, 74, 75, 87]

marks 2.5 per step to give 12 (last step 2 marks)

BinSrc(6,A,1,13)

Low=1

High=13

q=(1+13)/2=7

A[7]=9

x=6

yes 6< 9

set High to Mid

High = 7

Page 23 SEE NEXT PAGE

Page 24: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

BinSrc(6,A,1,7)

Low = 1

High = 7

q=(1+7)/2=4

A[4]=6

6 <= A[4]

set High to Mid

High = 4

BinSrc(6,A,1,4)

Low = 1

High = 4

q=(1+4)/2=2

A[2]=3

6>3 Search top half

Low = q+1=3

BinSrc(6,A,3,4)

Low=3

High = 4

q=(3+4)/2=3

A[3]=5

6>5 Search top half

Low=q+1=4

BinSrc(6,A,4,4)

4=4 end of search

no value of x is lower

return index 4

(iii) To produce a Parallel Merge of two sorted arrays A,B

Page 24 SEE NEXT PAGE

Page 25: SOLUTIONS King’s College London · SOLUTIONS King’s College London This paper is part of an examination of the College counting towards the award of a degree. Examinations are

�� SOLUTIONS ��

Example exam 2017 6CCS3PAL: IYA

Let q be median index of A

Let x = A(q) so that A = A(1...q − 1), x, A(q + 1...n)

Split B into B(< x) and B(≥ x) using Binary-Search(x,B)

Parallel-Merge(A(1...q−1), B(< x), ) x, Parallel-Merge(A(q+1...n), B(≥x))

The left half has entries at most x,

The right half has entries at least x.

This process can be done in parallel, thus removing the bottleneck in

the merge phase of sorting.

The span is poly-log n (Θ(log3 n) proved in Cormen text book Chapter

27)

Page 25 FINAL PAGE