copyright © the mcgraw-hill companies, inc. permission...
TRANSCRIPT
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Architecture and Parallel Algorithm Design
Outline:Outline:● Multicore architectureMulticore architecture● Task/channel modelTask/channel model● Algorithm design methodologyAlgorithm design methodology● Case studiesCase studies
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Moore's law based on transistor count
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Intel i7 processor
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Intel I-7 supports hyperthreading
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
AMD 6 core Istanbul Opteron
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Task/Channel Model
Parallel computation = set of tasksParallel computation = set of tasks TaskTask
ProgramProgram Local memoryLocal memory Collection of I/O portsCollection of I/O ports
Tasks interact by sending messages through Tasks interact by sending messages through channelschannels
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Task/Channel Model
TaskChannel
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Foster’s Design Methodology
PartitioningPartitioning CommunicationCommunication AgglomerationAgglomeration MappingMapping
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Foster’s Methodology
P r o b l e mP a r t i t i o n i n g
C o m m u n i c a t i o n
A g g l o m e r a t i o nM a p p i n g
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Partitioning
Dividing computation and data into piecesDividing computation and data into pieces Domain decompositionDomain decomposition
Divide data into piecesDivide data into pieces Determine how to associate computations with Determine how to associate computations with
the datathe data Functional decompositionFunctional decomposition
Divide computation into piecesDivide computation into pieces Determine how to associate data with the Determine how to associate data with the
computationscomputations
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example Domain Decompositions
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example Functional Decomposition
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Partitioning Checklist
At least 10x more primitive tasks than At least 10x more primitive tasks than processors in target computerprocessors in target computer
Minimize redundant computations and Minimize redundant computations and redundant data storageredundant data storage
Primitive tasks roughly the same sizePrimitive tasks roughly the same size Number of tasks an increasing function of Number of tasks an increasing function of
problem sizeproblem size
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Communication
Determine values passed among tasksDetermine values passed among tasks Local communicationLocal communication
Task needs values from a small number of other Task needs values from a small number of other taskstasks
Create channels illustrating data flowCreate channels illustrating data flow Global communicationGlobal communication
Significant number of tasks contribute data to Significant number of tasks contribute data to perform a computationperform a computation
Don’t create channels for them early in designDon’t create channels for them early in design
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Communication Checklist
Communication operations balanced among Communication operations balanced among taskstasks
Each task communicates with only small Each task communicates with only small group of neighborsgroup of neighbors
Tasks can perform communications Tasks can perform communications concurrentlyconcurrently
Task can perform computations Task can perform computations concurrentlyconcurrently
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration
Grouping tasks into larger tasksGrouping tasks into larger tasks GoalsGoals
Improve performanceImprove performance Maintain scalability of programMaintain scalability of program Simplify programmingSimplify programming
In MPI programming, goal often to create In MPI programming, goal often to create one agglomerated task per processorone agglomerated task per processor
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration Can Improve Performance Eliminate communication between Eliminate communication between
primitive tasks agglomerated into primitive tasks agglomerated into consolidated taskconsolidated task
Combine groups of sending and receiving Combine groups of sending and receiving taskstasks
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration Checklist
Locality of parallel algorithm has increasedLocality of parallel algorithm has increased Replicated computations take less time than Replicated computations take less time than
communications they replacecommunications they replace Data replication doesn’t affect scalabilityData replication doesn’t affect scalability Agglomerated tasks have similar computational Agglomerated tasks have similar computational
and communications costsand communications costs Number of tasks increases with problem sizeNumber of tasks increases with problem size Number of tasks suitable for likely target systemsNumber of tasks suitable for likely target systems Tradeoff between agglomeration and code Tradeoff between agglomeration and code
modifications costs is reasonablemodifications costs is reasonable
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Mapping
Process of assigning tasks to processorsProcess of assigning tasks to processors Centralized multiprocessor: mapping done Centralized multiprocessor: mapping done
by operating systemby operating system Distributed memory system: mapping done Distributed memory system: mapping done
by userby user Conflicting goals of mappingConflicting goals of mapping
Maximize processor utilizationMaximize processor utilization Minimize interprocessor communicationMinimize interprocessor communication
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Mapping Example
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Optimal Mapping
Finding optimal mapping is NP-hardFinding optimal mapping is NP-hard Must rely on heuristicsMust rely on heuristics
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Mapping Decision Tree
Static number of tasksStatic number of tasks Structured communicationStructured communication
Constant computation time per taskConstant computation time per task• Agglomerate tasks to minimize commAgglomerate tasks to minimize comm• Create one task per processorCreate one task per processor
Variable computation time per taskVariable computation time per task• Cyclically map tasks to processorsCyclically map tasks to processors
Unstructured communicationUnstructured communication• Use a static load balancing algorithmUse a static load balancing algorithm
Dynamic number of tasksDynamic number of tasks
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Mapping Strategy
Static number of tasksStatic number of tasks Dynamic number of tasksDynamic number of tasks
Frequent communications between tasksFrequent communications between tasksUse a dynamic load balancing Use a dynamic load balancing
algorithmalgorithm Many short-lived tasksMany short-lived tasks
Use a run-time task-scheduling Use a run-time task-scheduling algorithmalgorithm
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Mapping Checklist
Considered designs based on one task per Considered designs based on one task per processor and multiple tasks per processorprocessor and multiple tasks per processor
Evaluated static and dynamic task allocationEvaluated static and dynamic task allocation If dynamic task allocation chosen, task If dynamic task allocation chosen, task
allocator is not a bottleneck to performanceallocator is not a bottleneck to performance If static task allocation chosen, ratio of tasks If static task allocation chosen, ratio of tasks
to processors is at least 10:1to processors is at least 10:1
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Case Studies
Boundary value problemBoundary value problem Finding the maximumFinding the maximum The n-body problemThe n-body problem Adding data inputAdding data input
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Partitioning
One data item per grid pointOne data item per grid point Associate one primitive task with each grid Associate one primitive task with each grid
pointpoint Two-dimensional domain decompositionTwo-dimensional domain decomposition
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Communication
Identify communication pattern between Identify communication pattern between primitive tasksprimitive tasks
Each interior primitive task has three Each interior primitive task has three incoming and three outgoing channelsincoming and three outgoing channels
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration and Mapping
Agglomeration
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Reduction
Given associative operator Given associative operator aa00 aa11 aa22 … … aan-1n-1
ExamplesExamples AddAdd MultiplyMultiply And, OrAnd, Or Maximum, MinimumMaximum, Minimum
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Reduction Evolution
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Reduction Evolution
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Reduction Evolution
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Binomial Trees
Subgraph of hypercube
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Finding Global Sum
4 2 0 7
-3
5 -6
-3
8 1 2 3
-4
4 6 -1
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Finding Global Sum
1 7 -6
4
4 5 8 2
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Finding Global Sum
8 -2
9 10
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Finding Global Sum
17
8
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Finding Global Sum
25
Binomial Tree
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration
sum
sum sum
sum
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary: Task/channel Model
Parallel computationParallel computation Set of tasksSet of tasks Interactions through channelsInteractions through channels
Good designsGood designs Maximize local computationsMaximize local computations Minimize communicationsMinimize communications Scale upScale up
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary: Design Steps
Partition computationPartition computation Agglomerate tasksAgglomerate tasks Map tasks to processorsMap tasks to processors GoalsGoals
Maximize processor utilizationMaximize processor utilization Minimize inter-processor communicationMinimize inter-processor communication