copyright © the mcgraw-hill companies, inc. permission...

42
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Architecture and Parallel Algorithm Design Outline: Outline: Multicore architecture Multicore architecture Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies Case studies

Upload: others

Post on 25-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Architecture and Parallel Algorithm Design

Outline:Outline:● Multicore architectureMulticore architecture● Task/channel modelTask/channel model● Algorithm design methodologyAlgorithm design methodology● Case studiesCase studies

Page 2: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Moore's law based on transistor count

Page 3: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Intel i7 processor

Page 4: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Intel I-7 supports hyperthreading

Page 5: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

AMD 6 core Istanbul Opteron

Page 6: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Task/Channel Model

Parallel computation = set of tasksParallel computation = set of tasks TaskTask

ProgramProgram Local memoryLocal memory Collection of I/O portsCollection of I/O ports

Tasks interact by sending messages through Tasks interact by sending messages through channelschannels

Page 7: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Task/Channel Model

TaskChannel

Page 8: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Foster’s Design Methodology

PartitioningPartitioning CommunicationCommunication AgglomerationAgglomeration MappingMapping

Page 9: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Foster’s Methodology

P r o b l e mP a r t i t i o n i n g

C o m m u n i c a t i o n

A g g l o m e r a t i o nM a p p i n g

Page 10: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Partitioning

Dividing computation and data into piecesDividing computation and data into pieces Domain decompositionDomain decomposition

Divide data into piecesDivide data into pieces Determine how to associate computations with Determine how to associate computations with

the datathe data Functional decompositionFunctional decomposition

Divide computation into piecesDivide computation into pieces Determine how to associate data with the Determine how to associate data with the

computationscomputations

Page 11: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Example Domain Decompositions

Page 12: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Example Functional Decomposition

Page 13: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Partitioning Checklist

At least 10x more primitive tasks than At least 10x more primitive tasks than processors in target computerprocessors in target computer

Minimize redundant computations and Minimize redundant computations and redundant data storageredundant data storage

Primitive tasks roughly the same sizePrimitive tasks roughly the same size Number of tasks an increasing function of Number of tasks an increasing function of

problem sizeproblem size

Page 14: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication

Determine values passed among tasksDetermine values passed among tasks Local communicationLocal communication

Task needs values from a small number of other Task needs values from a small number of other taskstasks

Create channels illustrating data flowCreate channels illustrating data flow Global communicationGlobal communication

Significant number of tasks contribute data to Significant number of tasks contribute data to perform a computationperform a computation

Don’t create channels for them early in designDon’t create channels for them early in design

Page 15: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Checklist

Communication operations balanced among Communication operations balanced among taskstasks

Each task communicates with only small Each task communicates with only small group of neighborsgroup of neighbors

Tasks can perform communications Tasks can perform communications concurrentlyconcurrently

Task can perform computations Task can perform computations concurrentlyconcurrently

Page 16: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration

Grouping tasks into larger tasksGrouping tasks into larger tasks GoalsGoals

Improve performanceImprove performance Maintain scalability of programMaintain scalability of program Simplify programmingSimplify programming

In MPI programming, goal often to create In MPI programming, goal often to create one agglomerated task per processorone agglomerated task per processor

Page 17: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration Can Improve Performance Eliminate communication between Eliminate communication between

primitive tasks agglomerated into primitive tasks agglomerated into consolidated taskconsolidated task

Combine groups of sending and receiving Combine groups of sending and receiving taskstasks

Page 18: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration Checklist

Locality of parallel algorithm has increasedLocality of parallel algorithm has increased Replicated computations take less time than Replicated computations take less time than

communications they replacecommunications they replace Data replication doesn’t affect scalabilityData replication doesn’t affect scalability Agglomerated tasks have similar computational Agglomerated tasks have similar computational

and communications costsand communications costs Number of tasks increases with problem sizeNumber of tasks increases with problem size Number of tasks suitable for likely target systemsNumber of tasks suitable for likely target systems Tradeoff between agglomeration and code Tradeoff between agglomeration and code

modifications costs is reasonablemodifications costs is reasonable

Page 19: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Mapping

Process of assigning tasks to processorsProcess of assigning tasks to processors Centralized multiprocessor: mapping done Centralized multiprocessor: mapping done

by operating systemby operating system Distributed memory system: mapping done Distributed memory system: mapping done

by userby user Conflicting goals of mappingConflicting goals of mapping

Maximize processor utilizationMaximize processor utilization Minimize interprocessor communicationMinimize interprocessor communication

Page 20: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Mapping Example

Page 21: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Optimal Mapping

Finding optimal mapping is NP-hardFinding optimal mapping is NP-hard Must rely on heuristicsMust rely on heuristics

Page 22: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Mapping Decision Tree

Static number of tasksStatic number of tasks Structured communicationStructured communication

Constant computation time per taskConstant computation time per task• Agglomerate tasks to minimize commAgglomerate tasks to minimize comm• Create one task per processorCreate one task per processor

Variable computation time per taskVariable computation time per task• Cyclically map tasks to processorsCyclically map tasks to processors

Unstructured communicationUnstructured communication• Use a static load balancing algorithmUse a static load balancing algorithm

Dynamic number of tasksDynamic number of tasks

Page 23: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Mapping Strategy

Static number of tasksStatic number of tasks Dynamic number of tasksDynamic number of tasks

Frequent communications between tasksFrequent communications between tasksUse a dynamic load balancing Use a dynamic load balancing

algorithmalgorithm Many short-lived tasksMany short-lived tasks

Use a run-time task-scheduling Use a run-time task-scheduling algorithmalgorithm

Page 24: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Mapping Checklist

Considered designs based on one task per Considered designs based on one task per processor and multiple tasks per processorprocessor and multiple tasks per processor

Evaluated static and dynamic task allocationEvaluated static and dynamic task allocation If dynamic task allocation chosen, task If dynamic task allocation chosen, task

allocator is not a bottleneck to performanceallocator is not a bottleneck to performance If static task allocation chosen, ratio of tasks If static task allocation chosen, ratio of tasks

to processors is at least 10:1to processors is at least 10:1

Page 25: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Case Studies

Boundary value problemBoundary value problem Finding the maximumFinding the maximum The n-body problemThe n-body problem Adding data inputAdding data input

Page 26: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Partitioning

One data item per grid pointOne data item per grid point Associate one primitive task with each grid Associate one primitive task with each grid

pointpoint Two-dimensional domain decompositionTwo-dimensional domain decomposition

Page 27: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication

Identify communication pattern between Identify communication pattern between primitive tasksprimitive tasks

Each interior primitive task has three Each interior primitive task has three incoming and three outgoing channelsincoming and three outgoing channels

Page 28: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration and Mapping

Agglomeration

Page 29: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Reduction

Given associative operator Given associative operator aa00 aa11 aa22 … … aan-1n-1

ExamplesExamples AddAdd MultiplyMultiply And, OrAnd, Or Maximum, MinimumMaximum, Minimum

Page 30: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Parallel Reduction Evolution

Page 31: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Parallel Reduction Evolution

Page 32: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Parallel Reduction Evolution

Page 33: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Binomial Trees

Subgraph of hypercube

Page 34: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding Global Sum

4 2 0 7

-3

5 -6

-3

8 1 2 3

-4

4 6 -1

Page 35: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding Global Sum

1 7 -6

4

4 5 8 2

Page 36: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding Global Sum

8 -2

9 10

Page 37: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding Global Sum

17

8

Page 38: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding Global Sum

25

Binomial Tree

Page 39: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration

Page 40: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration

sum

sum sum

sum

Page 41: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Summary: Task/channel Model

Parallel computationParallel computation Set of tasksSet of tasks Interactions through channelsInteractions through channels

Good designsGood designs Maximize local computationsMaximize local computations Minimize communicationsMinimize communications Scale upScale up

Page 42: Copyright © The McGraw-Hill Companies, Inc. Permission ...acc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter3s16.pdf · Copyright © The McGraw-Hill Companies, Inc. Permission

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Summary: Design Steps

Partition computationPartition computation Agglomerate tasksAgglomerate tasks Map tasks to processorsMap tasks to processors GoalsGoals

Maximize processor utilizationMaximize processor utilization Minimize inter-processor communicationMinimize inter-processor communication