2016/1/6part i1 a taste of parallel algorithms. 2016/1/6part i2 we examine five simple...
TRANSCRIPT
112/04/21 Part I 1
A Taste of Parallel Algorithms
112/04/21 Part I 2
• We examine five simple building-block parallel operations and look at the corresponding algorithms on four simple parallel architectures: linear array, binary tree, 2D mesh, and a simple shared variable computer.
112/04/21 Part I 3
Semigroup Computation
112/04/21 Part I 4
Parallel Prefix Computation
112/04/21 Part I 5
Packet Routing
• A packet of information resides at Processor i and must be sent to Processor j. The problem is to route the packet through intermediate processors, if needed, such that it gets to the destination as quickly as possible.
• The problem becomes more challenging when multiple packets reside at different processors, each with its own destination.
• When each processor has at most one packet to send and one packet to receive, the packet routing problem is called one-to-one communication or 1-1 routing.
112/04/21 Part I 6
Broadcasting
• Given a value a known at a certain processor i, disseminate it to all p processors as quickly as possible, so that at the end, every processor has access to, or "knows," the value. This is sometimes referred to as one-to-all communication.
• one-to-many communication, is known as multicasting.
112/04/21 Part I 7
Sorting
• Rather than sorting a set of records, each with a key and data elements, we focus on sorting a set of keys for simplicity.
112/04/21 Part I 8
Linear Array
• D=p-1
• d=2
• Ring?
112/04/21 Part I 9
Binary Tree
• If all leaf levels are identical and every nonleaf processor has two children, the binary tree is said to be complete.
• D=
• d=3
112/04/21 Part I 10
2D Mesh
• D=
• d=4
• Torus?
112/04/21 Part I 11
Shared memory
• A shared-memory multiprocessor can be modeled as a complete graph, in which every node is connected to every other node.
• D=1
• d=p-1
112/04/21 Part I 12
Algorithms for a Linear Array (1)
• Semigroup Computation – Let us consider first a special case of semigroup computation,
namely, that of maximum finding. Each of the p processors holds a value initially and our goal is for every processor to know the largest of these values.
112/04/21 Part I 13
Algorithms for a Linear Array (2)
• Parallel Prefix Computation (Case1)
112/04/21 Part I 14
Algorithms for a Linear Array (3)
• Parallel Prefix Computation (Case2, more than one value)
112/04/21 Part I 15
Algorithms for a Linear Array (4)
• Packet Routing
112/04/21 Part I 16
Algorithms for a Linear Array (5)
• Broadcasting – If Processor i wants to broadcast a value a to all processors, it
sends an rbcast(a) (read r-broadcast) message to its right neighbor and an lbcast(a) message to its left neighbor.
112/04/21 Part I 17
Algorithms for a Linear Array (6)
• Sorting (Case 1)
112/04/21 Part I 18
Algorithms for a Linear Array (7)
• Sorting (Case 2, odd-even transposition) (efficiency?)
112/04/21 Part I 19
Algorithms for a Binary Tree (1)
• In algorithms for a binary tree of processors, we will assume that the data elements are initially held by the leaf processors only.
• The nonleaf (inner) processors participate in the computation, but do not hold data elements of their own.
112/04/21 Part I 20
Algorithms for a Binary Tree (2)
• Semigroup Computation– Each inner node receives two values from its children, applies
the operator to them, and passes the result upward to its parent.
112/04/21 Part I 21
Algorithms for a Binary Tree (3)
• Parallel Prefix Computation
112/04/21 Part I 22
Algorithms for a Binary Tree (4)
• Packet Routing – depends on the processor numbering scheme used.
– Preorder
112/04/21 Part I 23
Algorithms for a Binary Tree (5)
• Broadcasting– Processor i sends the desired data upwards to the root processor,
which then broadcasts the data downwards to all processors.
112/04/21 Part I 24
Algorithms for a Binary Tree (6)
• Sorting
112/04/21 Part I 25
Algorithms for 2D Mesh (1)
• In all of the 2D mesh algorithms presented in this section, we use the linear-array algorithms of Section 2.3 as building blocks.
• This leads to simple algorithms, but not necessarily the most efficient ones. Mesh-based architectures and their algorithms will be discussed in great detail in Part III.
112/04/21 Part I 26
Algorithms for 2D Mesh (2)
• Semigroup Computation– For example, in finding the maximum of a set of p values, stored
one per processor, the row maximums are computed first and made available to every processor in the row. Then column maximums are identified.
112/04/21 Part I 27
Algorithms for 2D Mesh (3)
• Parallel Prefix Computation– (1) do a parallel prefix computation on each row,
– (2) do a diminished parallel prefix computation in the rightmost column, and
– (3) broadcast the results in the rightmost column to all of the elements in the respective rows and combine with the initially computed row prefix value.
112/04/21 Part I 28
Algorithms for 2D Mesh (4)
• Packet Routing– To route a data packet from the processor in Row r, Column c, to
the processor in Row r', Column c', we first route it within Row r to Column c'. Then, we route it in Column c' from Row r to Row r'. (row-first routing)
112/04/21 Part I 29
Algorithms for 2D Mesh (5)
• Broadcasting– (1) broadcast the packet to every processor in the source node's
row and
– (2) broadcast in all columns.
112/04/21 Part I 30
Algorithms for 2D Mesh (6)
• Sorting
112/04/21 Part I 31
Algorithms for Shared Variables
• Semigroup Computation
• Parallel Prefix computation
• Packet Routing (Trivial in view of the direct communication path between any pair of processors)
• Broadcasting (Trivial, as each processor can send a data item to all processors directly)
• Sorting