2016/1/6part i1 a taste of parallel algorithms. 2016/1/6part i2 we examine five simple...

31
111/06/21 Part I 1 A Taste of Parallel Algorithms

Upload: spencer-blankenship

Post on 19-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 1

A Taste of Parallel Algorithms

Page 2: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 2

• We examine five simple building-block parallel operations and look at the corresponding algorithms on four simple parallel architectures: linear array, binary tree, 2D mesh, and a simple shared variable computer.

Page 3: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 3

Semigroup Computation

Page 4: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 4

Parallel Prefix Computation

Page 5: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 5

Packet Routing

• A packet of information resides at Processor i and must be sent to Processor j. The problem is to route the packet through intermediate processors, if needed, such that it gets to the destination as quickly as possible.

• The problem becomes more challenging when multiple packets reside at different processors, each with its own destination.

• When each processor has at most one packet to send and one packet to receive, the packet routing problem is called one-to-one communication or 1-1 routing.

Page 6: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 6

Broadcasting

• Given a value a known at a certain processor i, disseminate it to all p processors as quickly as possible, so that at the end, every processor has access to, or "knows," the value. This is sometimes referred to as one-to-all communication.

• one-to-many communication, is known as multicasting.

Page 7: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 7

Sorting

• Rather than sorting a set of records, each with a key and data elements, we focus on sorting a set of keys for simplicity.

Page 8: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 8

Linear Array

• D=p-1

• d=2

• Ring?

Page 9: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 9

Binary Tree

• If all leaf levels are identical and every nonleaf processor has two children, the binary tree is said to be complete.

• D=

• d=3

Page 10: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 10

2D Mesh

• D=

• d=4

• Torus?

Page 11: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 11

Shared memory

• A shared-memory multiprocessor can be modeled as a complete graph, in which every node is connected to every other node.

• D=1

• d=p-1

Page 12: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 12

Algorithms for a Linear Array (1)

• Semigroup Computation – Let us consider first a special case of semigroup computation,

namely, that of maximum finding. Each of the p processors holds a value initially and our goal is for every processor to know the largest of these values.

Page 13: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 13

Algorithms for a Linear Array (2)

• Parallel Prefix Computation (Case1)

Page 14: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 14

Algorithms for a Linear Array (3)

• Parallel Prefix Computation (Case2, more than one value)

Page 15: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 15

Algorithms for a Linear Array (4)

• Packet Routing

Page 16: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 16

Algorithms for a Linear Array (5)

• Broadcasting – If Processor i wants to broadcast a value a to all processors, it

sends an rbcast(a) (read r-broadcast) message to its right neighbor and an lbcast(a) message to its left neighbor.

Page 17: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 17

Algorithms for a Linear Array (6)

• Sorting (Case 1)

Page 18: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 18

Algorithms for a Linear Array (7)

• Sorting (Case 2, odd-even transposition) (efficiency?)

Page 19: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 19

Algorithms for a Binary Tree (1)

• In algorithms for a binary tree of processors, we will assume that the data elements are initially held by the leaf processors only.

• The nonleaf (inner) processors participate in the computation, but do not hold data elements of their own.

Page 20: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 20

Algorithms for a Binary Tree (2)

• Semigroup Computation– Each inner node receives two values from its children, applies

the operator to them, and passes the result upward to its parent.

Page 21: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 21

Algorithms for a Binary Tree (3)

• Parallel Prefix Computation

Page 22: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 22

Algorithms for a Binary Tree (4)

• Packet Routing – depends on the processor numbering scheme used.

– Preorder

Page 23: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 23

Algorithms for a Binary Tree (5)

• Broadcasting– Processor i sends the desired data upwards to the root processor,

which then broadcasts the data downwards to all processors.

Page 24: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 24

Algorithms for a Binary Tree (6)

• Sorting

Page 25: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 25

Algorithms for 2D Mesh (1)

• In all of the 2D mesh algorithms presented in this section, we use the linear-array algorithms of Section 2.3 as building blocks.

• This leads to simple algorithms, but not necessarily the most efficient ones. Mesh-based architectures and their algorithms will be discussed in great detail in Part III.

Page 26: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 26

Algorithms for 2D Mesh (2)

• Semigroup Computation– For example, in finding the maximum of a set of p values, stored

one per processor, the row maximums are computed first and made available to every processor in the row. Then column maximums are identified.

Page 27: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 27

Algorithms for 2D Mesh (3)

• Parallel Prefix Computation– (1) do a parallel prefix computation on each row,

– (2) do a diminished parallel prefix computation in the rightmost column, and

– (3) broadcast the results in the rightmost column to all of the elements in the respective rows and combine with the initially computed row prefix value.

Page 28: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 28

Algorithms for 2D Mesh (4)

• Packet Routing– To route a data packet from the processor in Row r, Column c, to

the processor in Row r', Column c', we first route it within Row r to Column c'. Then, we route it in Column c' from Row r to Row r'. (row-first routing)

Page 29: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 29

Algorithms for 2D Mesh (5)

• Broadcasting– (1) broadcast the packet to every processor in the source node's

row and

– (2) broadcast in all columns.

Page 30: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 30

Algorithms for 2D Mesh (6)

• Sorting

Page 31: 2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding

112/04/21 Part I 31

Algorithms for Shared Variables

• Semigroup Computation

• Parallel Prefix computation

• Packet Routing (Trivial in view of the direct communication path between any pair of processors)

• Broadcasting (Trivial, as each processor can send a data item to all processors directly)

• Sorting