parallel and distributed computingparallelcomp.github.io/lecture4_2(betterversion).pdf · parallel...

69
Foster’s Methodology: Application Examples Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior T´ ecnico October 19, 2011 CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 1 / 25

Upload: others

Post on 12-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Foster’s Methodology: Application Examples

Parallel and Distributed Computing

Department of Computer Science and Engineering (DEI)Instituto Superior Tecnico

October 19, 2011

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 1 / 25

Page 2: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Outline

Foster’s design methodology

partitioningcommunicationagglomerationmapping

Application Examples

Boundary value problemFinding the maximumn-body problem

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 2 / 25

Page 3: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Distributed Memory Systems

Parallel programming of distributed-memory systems is significantlydifferent from shared-memory systems essentially due to the very largeoverhead in terms of:

communication

task initialization / termination

⇒ efficient parallelization requires that these effects be taken into accountfrom the start!

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 3 / 25

Page 4: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Foster’s Design Methodology

Problem

Primitive Tasks

Partitioning

Communication

Agglomeration

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 4 / 25

Page 5: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Foster’s Design Methodology

Problem

Primitive Tasks

Partitioning

Communication

Agglomeration

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 4 / 25

Page 6: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Foster’s Design Methodology

Problem

Primitive Tasks

Partitioning

Communication

Agglomeration

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 4 / 25

Page 7: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Foster’s Design Methodology

Problem

Primitive Tasks

Partitioning

Communication

Agglomeration

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 4 / 25

Page 8: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Foster’s Design Methodology

Problem

Primitive Tasks

Partitioning

Communication

Agglomeration

Mapping

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 4 / 25

Page 9: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Problem

Determine the evolution of the temperature of a rod in the followingconditions:

rod of length 1

both ends of the rod at 0oC

uniform material

insulated except at the ends

initial temperature of a point at distance x is 100 sin(π x)

A partial differential equation (PDE) governs the temperature of the rodin time. Typically too complicated to solve analytically.

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 5 / 25

Page 10: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Problem

Determine the evolution of the temperature of a rod in the followingconditions:

rod of length 1

both ends of the rod at 0oC

uniform material

insulated except at the ends

initial temperature of a point at distance x is 100 sin(π x)

A partial differential equation (PDE) governs the temperature of the rodin time. Typically too complicated to solve analytically.

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 5 / 25

Page 11: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Finite difference methods can be used to obtain an approximate solutionto these complex problems⇒ discretize space and time:

unit-length rod divided into n sections of length h

time from 0 to P divided into m periods of length k

We obtain a grid n ×m of temperatures T (x , t).

Temperature over time is given by:

T (x , t) = r · T (x − 1, t − 1) + (1− 2r) · T (x , t − 1) + r · T (x + 1, t − 1)

where r = k/h2.

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 6 / 25

Page 12: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Finite difference methods can be used to obtain an approximate solutionto these complex problems⇒ discretize space and time:

unit-length rod divided into n sections of length h

time from 0 to P divided into m periods of length k

We obtain a grid n ×m of temperatures T (x , t).

Temperature over time is given by:

T (x , t) = r · T (x − 1, t − 1) + (1− 2r) · T (x , t − 1) + r · T (x + 1, t − 1)

where r = k/h2.

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 6 / 25

Page 13: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 7 / 25

Page 14: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Partitioning:

Make each T (x , t) computation a primitive task.⇒ 2-dimensional domain decomposition

x

t

n

m

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 8 / 25

Page 15: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Partitioning:

Make each T (x , t) computation a primitive task.⇒ 2-dimensional domain decomposition

x

t

n

m

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 8 / 25

Page 16: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Communication:

x

t

n

m

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 9 / 25

Page 17: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Communication:

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 9 / 25

Page 18: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Agglomeration:

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 10 / 25

Page 19: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Agglomeration:

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 10 / 25

Page 20: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Agglomeration:

Mapping:

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 11 / 25

Page 21: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Agglomeration:

Mapping:

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 11 / 25

Page 22: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Agglomeration:

Mapping:

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 11 / 25

Page 23: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Analysis of execution time:

C : time to computeT (x , t) = rT (x − 1, t − 1) + (1− 2r)T (x , t − 1) + rT (x + 1, t − 1)

D: time to send/receive one T (x , t) from another processor

p: number of processors

Sequential algorithm:

mnC

Parallel algorithm:

computation per time instant:⌈

np

⌉C

communication per time instant: 2D(receiving is synchronous, sending asynchronous)

Total: m(⌈

np

⌉C + 2D)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 12 / 25

Page 24: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Analysis of execution time:

C : time to computeT (x , t) = rT (x − 1, t − 1) + (1− 2r)T (x , t − 1) + rT (x + 1, t − 1)

D: time to send/receive one T (x , t) from another processor

p: number of processors

Sequential algorithm: mnC

Parallel algorithm:

computation per time instant:

⌈np

⌉C

communication per time instant: 2D(receiving is synchronous, sending asynchronous)

Total: m(⌈

np

⌉C + 2D)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 12 / 25

Page 25: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Analysis of execution time:

C : time to computeT (x , t) = rT (x − 1, t − 1) + (1− 2r)T (x , t − 1) + rT (x + 1, t − 1)

D: time to send/receive one T (x , t) from another processor

p: number of processors

Sequential algorithm: mnC

Parallel algorithm:

computation per time instant:⌈

np

⌉C

communication per time instant:

2D(receiving is synchronous, sending asynchronous)

Total: m(⌈

np

⌉C + 2D)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 12 / 25

Page 26: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Boundary Value Problem

Analysis of execution time:

C : time to computeT (x , t) = rT (x − 1, t − 1) + (1− 2r)T (x , t − 1) + rT (x + 1, t − 1)

D: time to send/receive one T (x , t) from another processor

p: number of processors

Sequential algorithm: mnC

Parallel algorithm:

computation per time instant:⌈

np

⌉C

communication per time instant: 2D(receiving is synchronous, sending asynchronous)

Total: m(⌈

np

⌉C + 2D)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 12 / 25

Page 27: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Problem

Determine the maximum over a set of n values.

This is a particular case of a reduction:

a0 ⊕ a1 ⊕ a2 ⊕ · · · ⊕ an−1

where ⊕ can be any associative binary operator.

⇒ a reduction always takes Θ(n) time on a sequential computer

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 13 / 25

Page 28: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Partitioning:

Make each value a primitive task.⇒ 1-dimensional domain decomposition

One task will compute final solution: root task

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 14 / 25

Page 29: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Partitioning:

Make each value a primitive task.⇒ 1-dimensional domain decomposition

One task will compute final solution: root task

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 14 / 25

Page 30: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Communication:

n/4-1 tasks n/4-1 tasks

n-1 tasks n/2-1 tasks

Continue recursively: Binomial Tree

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 15 / 25

Page 31: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Communication:

n/4-1 tasks n/4-1 tasks

n-1 tasks n/2-1 tasks

Continue recursively: Binomial Tree

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 15 / 25

Page 32: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Communication:

n/4-1 tasks n/4-1 tasks

n/2-1 tasks n/2-1 tasks

Continue recursively: Binomial Tree

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 15 / 25

Page 33: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Communication:

n/4-1 tasks n/4-1 tasks

n/4-1 tasks n/4-1 tasks

Continue recursively: Binomial Tree

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 15 / 25

Page 34: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Communication:

n/4-1 tasks n/4-1 tasks

n/4-1 tasks n/4-1 tasks

Continue recursively: Binomial Tree

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 15 / 25

Page 35: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Binomial Trees

Recursive definition of Binomial Tree, with n = 2k nodes:

B

B

B

B0 k

k-1

k-1

B0 B1 B2 B3 B4

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 16 / 25

Page 36: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Binomial Trees

Recursive definition of Binomial Tree, with n = 2k nodes:

B

B

B

B0 k

k-1

k-1

B0 B1 B2 B3 B4

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 16 / 25

Page 37: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Illustrative example:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25

Page 38: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Illustrative example:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25

Page 39: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Illustrative example:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

7

7 3

5

-4

-4

1

-3

3

3

7

7 0

0 8

8

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25

Page 40: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Illustrative example:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

7

7 3

5

-4

-4

1

-3

3

3

7

7 0

0 8

8

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25

Page 41: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Illustrative example:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

7

7 3

7

-4

-4

1

-3

3

3

7

7 0

0 8

8

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25

Page 42: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Illustrative example:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

7

7 3

7

-4

-4

1

-3

3

3

7

7 0

0 8

8

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25

Page 43: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Illustrative example:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

7

7 3

7

-4

-4

7

-3

3

3

7

7 0

0 8

8

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25

Page 44: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Illustrative example:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

7

7 3

7

-4

-4

7

-3

3

3

7

7 0

0 8

8

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25

Page 45: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Agglomeration:

Group n leafs of the tree:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

Mapping:

The same (actually, in the agglomeration phase, use n such that you endup with p tasks).

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 18 / 25

Page 46: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Agglomeration:

Group n leafs of the tree:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

Mapping:

The same (actually, in the agglomeration phase, use n such that you endup with p tasks).

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 18 / 25

Page 47: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Agglomeration:

Group n leafs of the tree:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

Mapping:

The same (actually, in the agglomeration phase, use n such that you endup with p tasks).

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 18 / 25

Page 48: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Agglomeration:

Group n leafs of the tree:

7

7 3

5

-4

-9

1

-3

3

2

5

7 0

-1 8

1

Mapping:

The same (actually, in the agglomeration phase, use n such that you endup with p tasks).

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 18 / 25

Page 49: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Analysis of execution time:

C : time to perform the binary operation (maximum)

D: time to send/receive one value from another processor

p: number of processors

Sequential algorithm:

(n − 1)C

Parallel algorithm:

computation in the leafs: (⌈

np

⌉− 1)C

computation up the tree: dlog pe (C + D)

Total: (⌈

np

⌉− 1)C + dlog pe (C + D)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 19 / 25

Page 50: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Analysis of execution time:

C : time to perform the binary operation (maximum)

D: time to send/receive one value from another processor

p: number of processors

Sequential algorithm: (n − 1)C

Parallel algorithm:

computation in the leafs:

(⌈

np

⌉− 1)C

computation up the tree: dlog pe (C + D)

Total: (⌈

np

⌉− 1)C + dlog pe (C + D)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 19 / 25

Page 51: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Analysis of execution time:

C : time to perform the binary operation (maximum)

D: time to send/receive one value from another processor

p: number of processors

Sequential algorithm: (n − 1)C

Parallel algorithm:

computation in the leafs: (⌈

np

⌉− 1)C

computation up the tree:

dlog pe (C + D)

Total: (⌈

np

⌉− 1)C + dlog pe (C + D)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 19 / 25

Page 52: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Finding the Maximum

Analysis of execution time:

C : time to perform the binary operation (maximum)

D: time to send/receive one value from another processor

p: number of processors

Sequential algorithm: (n − 1)C

Parallel algorithm:

computation in the leafs: (⌈

np

⌉− 1)C

computation up the tree: dlog pe (C + D)

Total: (⌈

np

⌉− 1)C + dlog pe (C + D)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 19 / 25

Page 53: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Problem

Simulate the motion of n particles of varying masses in two dimensions.

compute new positions and velocities

consider gravitational interactions only

Straightforward sequential algorithms solve this problem in Θ(n2) time(however, better time complexity algorithms exist)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 20 / 25

Page 54: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Partitioning:

Make each particle a primitive task.⇒ 1-dimensional domain decomposition

Communication:

All tasks need to communicate with all tasks!

Gather operation: one task receive a dataset from all tasks.

All-gather operation: all tasks receive a dataset from all tasks.

0 1

2 3

0 1

2 3

Implement communication using a hypercube topology!

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25

Page 55: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Partitioning:

Make each particle a primitive task.⇒ 1-dimensional domain decomposition

Communication:

All tasks need to communicate with all tasks!

Gather operation: one task receive a dataset from all tasks.

All-gather operation: all tasks receive a dataset from all tasks.

0 1

2 3

0 1

2 3

Implement communication using a hypercube topology!

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25

Page 56: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Partitioning:

Make each particle a primitive task.⇒ 1-dimensional domain decomposition

Communication:

All tasks need to communicate with all tasks!

Gather operation: one task receive a dataset from all tasks.

All-gather operation: all tasks receive a dataset from all tasks.

0 1

2 3

0 1

2 3

Implement communication using a hypercube topology!

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25

Page 57: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Partitioning:

Make each particle a primitive task.⇒ 1-dimensional domain decomposition

Communication:

All tasks need to communicate with all tasks!

Gather operation: one task receive a dataset from all tasks.

All-gather operation: all tasks receive a dataset from all tasks.

0 1

2 3

0 1

2 3

Implement communication using a hypercube topology!

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25

Page 58: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Partitioning:

Make each particle a primitive task.⇒ 1-dimensional domain decomposition

Communication:

All tasks need to communicate with all tasks!

Gather operation: one task receive a dataset from all tasks.

All-gather operation: all tasks receive a dataset from all tasks.

0 1

2 3

0 1

2 3

Implement communication using a hypercube topology!

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25

Page 59: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Partitioning:

Make each particle a primitive task.⇒ 1-dimensional domain decomposition

Communication:

All tasks need to communicate with all tasks!

Gather operation: one task receive a dataset from all tasks.

All-gather operation: all tasks receive a dataset from all tasks.

0 1

2 3

0 1

2 3

Implement communication using a hypercube topology!

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25

Page 60: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Agglomeration / Mapping:

All primitive tasks have the same computation cost and communicationpattern

⇒ no particular strategy for agglomeration required

Agglomerate n/p primitive tasks, so that we have one task per processor.

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 22 / 25

Page 61: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Analysis of execution time (per time instant):

C : time to compute new particle position and velocity

D: time to initiate message

B: (bandwidth) number of data units that can be sent in one unit of time

p: number of processors

Sequential algorithm: Cn(n − 1)Parallel algorithm:

Computation time:

C np (n − 1)

Communication time:one message with k data units: D + k

Bdepth of tree: log plength of message at depth i : 2i−1 n

p

total communication time:∑log p

i=1

(D + 2i−1n

pB

)= D log p + n(p−1)

pB

Total: D log p + np

(p−1B + C (n − 1)

)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25

Page 62: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Analysis of execution time (per time instant):

C : time to compute new particle position and velocity

D: time to initiate message

B: (bandwidth) number of data units that can be sent in one unit of time

p: number of processors

Sequential algorithm: Cn(n − 1)Parallel algorithm:

Computation time: C np (n − 1)

Communication time:

one message with k data units: D + kB

depth of tree: log plength of message at depth i : 2i−1 n

p

total communication time:∑log p

i=1

(D + 2i−1n

pB

)= D log p + n(p−1)

pB

Total: D log p + np

(p−1B + C (n − 1)

)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25

Page 63: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Analysis of execution time (per time instant):

C : time to compute new particle position and velocity

D: time to initiate message

B: (bandwidth) number of data units that can be sent in one unit of time

p: number of processors

Sequential algorithm: Cn(n − 1)Parallel algorithm:

Computation time: C np (n − 1)

Communication time:one message with k data units:

D + kB

depth of tree: log plength of message at depth i : 2i−1 n

p

total communication time:∑log p

i=1

(D + 2i−1n

pB

)= D log p + n(p−1)

pB

Total: D log p + np

(p−1B + C (n − 1)

)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25

Page 64: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Analysis of execution time (per time instant):

C : time to compute new particle position and velocity

D: time to initiate message

B: (bandwidth) number of data units that can be sent in one unit of time

p: number of processors

Sequential algorithm: Cn(n − 1)Parallel algorithm:

Computation time: C np (n − 1)

Communication time:one message with k data units: D + k

Bdepth of tree:

log plength of message at depth i : 2i−1 n

p

total communication time:∑log p

i=1

(D + 2i−1n

pB

)= D log p + n(p−1)

pB

Total: D log p + np

(p−1B + C (n − 1)

)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25

Page 65: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Analysis of execution time (per time instant):

C : time to compute new particle position and velocity

D: time to initiate message

B: (bandwidth) number of data units that can be sent in one unit of time

p: number of processors

Sequential algorithm: Cn(n − 1)Parallel algorithm:

Computation time: C np (n − 1)

Communication time:one message with k data units: D + k

Bdepth of tree: log plength of message at depth i :

2i−1 np

total communication time:∑log p

i=1

(D + 2i−1n

pB

)= D log p + n(p−1)

pB

Total: D log p + np

(p−1B + C (n − 1)

)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25

Page 66: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Analysis of execution time (per time instant):

C : time to compute new particle position and velocity

D: time to initiate message

B: (bandwidth) number of data units that can be sent in one unit of time

p: number of processors

Sequential algorithm: Cn(n − 1)Parallel algorithm:

Computation time: C np (n − 1)

Communication time:one message with k data units: D + k

Bdepth of tree: log plength of message at depth i : 2i−1 n

p

total communication time:

∑log pi=1

(D + 2i−1n

pB

)= D log p + n(p−1)

pB

Total: D log p + np

(p−1B + C (n − 1)

)

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25

Page 67: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

n-Body Problem

Analysis of execution time (per time instant):

C : time to compute new particle position and velocity

D: time to initiate message

B: (bandwidth) number of data units that can be sent in one unit of time

p: number of processors

Sequential algorithm: Cn(n − 1)Parallel algorithm:

Computation time: C np (n − 1)

Communication time:one message with k data units: D + k

Bdepth of tree: log plength of message at depth i : 2i−1 n

p

total communication time:∑log p

i=1

(D + 2i−1n

pB

)= D log p + n(p−1)

pB

Total: D log p + np

(p−1B + C (n − 1)

)CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25

Page 68: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Review

Foster’s design methodology

partitioningcommunicationagglomerationmapping

Application Examples

Boundary value problemFinding the maximumn-body problem

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 24 / 25

Page 69: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:

Next Class

MPI

CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 25 / 25