1. implementation: target architectures · implementation:... risc technology ... • different...
TRANSCRIPT
![Page 1: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/1.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
![Page 2: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/2.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
![Page 3: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/3.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
![Page 4: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/4.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
![Page 5: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/5.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
![Page 6: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/6.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
![Page 7: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/7.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
* increasing clock rates (> 2GHz almost standard)
![Page 8: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/8.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
* increasing clock rates (> 2GHz almost standard)
* more MIPS, more FLOPS
![Page 9: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/9.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
* increasing clock rates (> 2GHz almost standard)
* more MIPS, more FLOPS
* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip
![Page 10: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/10.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
* increasing clock rates (> 2GHz almost standard)
* more MIPS, more FLOPS
* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip
* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)
![Page 11: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/11.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
* increasing clock rates (> 2GHz almost standard)
* more MIPS, more FLOPS
* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip
* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)
– important features:
![Page 12: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/12.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
* increasing clock rates (> 2GHz almost standard)
* more MIPS, more FLOPS
* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip
* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)
– important features:
* RISC (Reduced Instruction Set Computer) technology
![Page 13: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/13.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
* increasing clock rates (> 2GHz almost standard)
* more MIPS, more FLOPS
* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip
* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)
– important features:
* RISC (Reduced Instruction Set Computer) technology
* well-developed pipelining
![Page 14: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/14.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
* increasing clock rates (> 2GHz almost standard)
* more MIPS, more FLOPS
* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip
* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)
– important features:
* RISC (Reduced Instruction Set Computer) technology
* well-developed pipelining
* superscalarprocessor organization
![Page 15: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/15.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
* increasing clock rates (> 2GHz almost standard)
* more MIPS, more FLOPS
* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip
* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)
– important features:
* RISC (Reduced Instruction Set Computer) technology
* well-developed pipelining
* superscalarprocessor organization
* cachingand multi-level memory hierarchy
![Page 16: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/16.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 1 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
1. Implementation: Target Architectures
• different target architectures for numerical simulations:
– monoprocessors
– supercomputers
• modern microprocessors:
– obvious trends:
* increasing clock rates (> 2GHz almost standard)
* more MIPS, more FLOPS
* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip
* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)
– important features:
* RISC (Reduced Instruction Set Computer) technology
* well-developed pipelining
* superscalarprocessor organization
* cachingand multi-level memory hierarchy
* VLIW, Multi Thread Architecture, On-chip multiproces-sors, ...
![Page 17: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/17.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 2 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
2. RISC Technology
• counter-trend to CISC: more and more complex instructions en-tailing microprogramming
![Page 18: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/18.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 2 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
2. RISC Technology
• counter-trend to CISC: more and more complex instructions en-tailing microprogramming
• now instead:
– relatively small number of instructions (tens)
![Page 19: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/19.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 2 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
2. RISC Technology
• counter-trend to CISC: more and more complex instructions en-tailing microprogramming
• now instead:
– relatively small number of instructions (tens)
– simple machine instructions, fixed format, few address modes
![Page 20: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/20.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 2 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
2. RISC Technology
• counter-trend to CISC: more and more complex instructions en-tailing microprogramming
• now instead:
– relatively small number of instructions (tens)
– simple machine instructions, fixed format, few address modes
– load-and-storeprinciple: only explicit LOAD/WRITE instruc-tions have memory access
![Page 21: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/21.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 2 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
2. RISC Technology
• counter-trend to CISC: more and more complex instructions en-tailing microprogramming
• now instead:
– relatively small number of instructions (tens)
– simple machine instructions, fixed format, few address modes
– load-and-storeprinciple: only explicit LOAD/WRITE instruc-tions have memory access
– no more need for microprogramming
![Page 22: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/22.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 3 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
3. Pipelining
• decompose instructions into simple steps involving different partsof the CPU:
– load,
![Page 23: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/23.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 3 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
3. Pipelining
• decompose instructions into simple steps involving different partsof the CPU:
– load,
– decode,
![Page 24: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/24.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 3 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
3. Pipelining
• decompose instructions into simple steps involving different partsof the CPU:
– load,
– decode,
– reserve registers,
![Page 25: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/25.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 3 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
3. Pipelining
• decompose instructions into simple steps involving different partsof the CPU:
– load,
– decode,
– reserve registers,
– execute,
![Page 26: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/26.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 3 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
3. Pipelining
• decompose instructions into simple steps involving different partsof the CPU:
– load,
– decode,
– reserve registers,
– execute,
– write results
![Page 27: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/27.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 3 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
3. Pipelining
• decompose instructions into simple steps involving different partsof the CPU:
– load,
– decode,
– reserve registers,
– execute,
– write results
• further improvement: reorder steps of an instruction (LOAD asearly as possible, WRITE as late as possible: avoids risk of idlewaiting time)
![Page 28: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/28.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 3 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
3. Pipelining
• decompose instructions into simple steps involving different partsof the CPU:
– load,
– decode,
– reserve registers,
– execute,
– write results
• further improvement: reorder steps of an instruction (LOAD asearly as possible, WRITE as late as possible: avoids risk of idlewaiting time)
• best case: identical instructions to be pipelined/overlapped, asin vector processors
![Page 29: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/29.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 3 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
3. Pipelining
• decompose instructions into simple steps involving different partsof the CPU:
– load,
– decode,
– reserve registers,
– execute,
– write results
• further improvement: reorder steps of an instruction (LOAD asearly as possible, WRITE as late as possible: avoids risk of idlewaiting time)
• best case: identical instructions to be pipelined/overlapped, asin vector processors
• pipelining needs different functional units in the CPU that candeal with the different steps in parallel; therefore:
![Page 30: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/30.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 4 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
4. Superscalar Processors
• several parts of the CPU are available in more than 1 copy
![Page 31: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/31.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 4 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
4. Superscalar Processors
• several parts of the CPU are available in more than 1 copy
• example: MIPS R10000 has 5 execution pipelines
![Page 32: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/32.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 4 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
4. Superscalar Processors
• several parts of the CPU are available in more than 1 copy
• example: MIPS R10000 has 5 execution pipelines
– one for FP-multiplication, one for FP-addition
![Page 33: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/33.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 4 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
4. Superscalar Processors
• several parts of the CPU are available in more than 1 copy
• example: MIPS R10000 has 5 execution pipelines
– one for FP-multiplication, one for FP-addition
– two integer ALU (arithmetic-logical units)
![Page 34: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/34.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 4 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
4. Superscalar Processors
• several parts of the CPU are available in more than 1 copy
• example: MIPS R10000 has 5 execution pipelines
– one for FP-multiplication, one for FP-addition
– two integer ALU (arithmetic-logical units)
– one address pipeline
![Page 35: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/35.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 5 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
5. Cache Memory
• CPU performance increased faster than memory access speed
![Page 36: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/36.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 5 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
5. Cache Memory
• CPU performance increased faster than memory access speed
• thus: reduce memory access time / latency
![Page 37: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/37.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 5 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
5. Cache Memory
• CPU performance increased faster than memory access speed
• thus: reduce memory access time / latency
• cache memory: small and fast on-chip memory, keeps part ofthe main memory
![Page 38: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/38.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 5 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
5. Cache Memory
• CPU performance increased faster than memory access speed
• thus: reduce memory access time / latency
• cache memory: small and fast on-chip memory, keeps part ofthe main memory
• optimum: needed data is always available in cache memory
![Page 39: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/39.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 5 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
5. Cache Memory
• CPU performance increased faster than memory access speed
• thus: reduce memory access time / latency
• cache memory: small and fast on-chip memory, keeps part ofthe main memory
• optimum: needed data is always available in cache memory
• look for strategies to ensure hit-probability p close to 1:
![Page 40: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/40.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 5 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
5. Cache Memory
• CPU performance increased faster than memory access speed
• thus: reduce memory access time / latency
• cache memory: small and fast on-chip memory, keeps part ofthe main memory
• optimum: needed data is always available in cache memory
• look for strategies to ensure hit-probability p close to 1:
– choice of section: what to be kept in cache?
![Page 41: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/41.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 5 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
5. Cache Memory
• CPU performance increased faster than memory access speed
• thus: reduce memory access time / latency
• cache memory: small and fast on-chip memory, keeps part ofthe main memory
• optimum: needed data is always available in cache memory
• look for strategies to ensure hit-probability p close to 1:
– choice of section: what to be kept in cache?
– ensure locality of data (instructions in cache need data incache)
![Page 42: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/42.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 5 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
5. Cache Memory
• CPU performance increased faster than memory access speed
• thus: reduce memory access time / latency
• cache memory: small and fast on-chip memory, keeps part ofthe main memory
• optimum: needed data is always available in cache memory
• look for strategies to ensure hit-probability p close to 1:
– choice of section: what to be kept in cache?
– ensure locality of data (instructions in cache need data incache)
– strategies for fetching, replacement, and updating
![Page 43: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/43.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 5 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
5. Cache Memory
• CPU performance increased faster than memory access speed
• thus: reduce memory access time / latency
• cache memory: small and fast on-chip memory, keeps part ofthe main memory
• optimum: needed data is always available in cache memory
• look for strategies to ensure hit-probability p close to 1:
– choice of section: what to be kept in cache?
– ensure locality of data (instructions in cache need data incache)
– strategies for fetching, replacement, and updating
– association: how to check whether data are available incache?
![Page 44: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/44.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 5 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
5. Cache Memory
• CPU performance increased faster than memory access speed
• thus: reduce memory access time / latency
• cache memory: small and fast on-chip memory, keeps part ofthe main memory
• optimum: needed data is always available in cache memory
• look for strategies to ensure hit-probability p close to 1:
– choice of section: what to be kept in cache?
– ensure locality of data (instructions in cache need data incache)
– strategies for fetching, replacement, and updating
– association: how to check whether data are available incache?
– consistency: no different versions in cache and main mem-ory
![Page 45: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/45.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,
![Page 46: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/46.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,
– (level-1/2/3) cache,
![Page 47: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/47.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,
– (level-1/2/3) cache,
– main memory,
![Page 48: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/48.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,
– (level-1/2/3) cache,
– main memory,
– hard disk,
![Page 49: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/49.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,
– (level-1/2/3) cache,
– main memory,
– hard disk,
– remote memory
the faster, the smaller
![Page 50: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/50.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,
– (level-1/2/3) cache,
– main memory,
– hard disk,
– remote memory
the faster, the smaller
• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:
![Page 51: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/51.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,
– (level-1/2/3) cache,
– main memory,
– hard disk,
– remote memory
the faster, the smaller
• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:
– example: matrix-vector product Ax with A too large for cache
![Page 52: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/52.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,
– (level-1/2/3) cache,
– main memory,
– hard disk,
– remote memory
the faster, the smaller
• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:
– example: matrix-vector product Ax with A too large for cache
– standard algorithm:
* outer loop over rows of A,
* inner loop for scalar product of one row of A with x
![Page 53: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/53.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,
– (level-1/2/3) cache,
– main memory,
– hard disk,
– remote memory
the faster, the smaller
• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:
– example: matrix-vector product Ax with A too large for cache
– standard algorithm:
* outer loop over rows of A,
* inner loop for scalar product of one row of A with x
– if current contents of cache are some rows of A, it’s OK
![Page 54: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/54.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,
– (level-1/2/3) cache,
– main memory,
– hard disk,
– remote memory
the faster, the smaller
• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:
– example: matrix-vector product Ax with A too large for cache
– standard algorithm:
* outer loop over rows of A,
* inner loop for scalar product of one row of A with x
– if current contents of cache are some rows of A, it’s OK
– if current contents of cache are some columns of A: slow!
![Page 55: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/55.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 6 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
6. Memory Hierarchy
• today: several cache levels → memory hierarchy:
– register,– (level-1/2/3) cache,– main memory,– hard disk,– remote memory
the faster, the smaller
• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:
– example: matrix-vector product Ax with A too large for cache– standard algorithm:
* outer loop over rows of A,
* inner loop for scalar product of one row of A with x
– if current contents of cache are some rows of A, it’s OK– if current contents of cache are some columns of A: slow!– tuning crucial: peak performance up to 4 orders of magni-
tude higher than performance observed in practice (withouttuning)
![Page 56: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/56.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
![Page 57: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/57.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
![Page 58: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/58.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
– static network topologies:
* bus, ring, grid, or torus
![Page 59: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/59.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
– static network topologies:
* bus, ring, grid, or torus
* binary tree or fat tree
![Page 60: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/60.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
– static network topologies:
* bus, ring, grid, or torus
* binary tree or fat tree
* hypercube
![Page 61: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/61.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
– static network topologies:
* bus, ring, grid, or torus
* binary tree or fat tree
* hypercube
– dynamic network topologies:
* crossbar switch
![Page 62: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/62.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
– static network topologies:
* bus, ring, grid, or torus
* binary tree or fat tree
* hypercube
– dynamic network topologies:
* crossbar switch
* shuffle exchange network
![Page 63: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/63.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
– static network topologies:
* bus, ring, grid, or torus
* binary tree or fat tree
* hypercube
– dynamic network topologies:
* crossbar switch
* shuffle exchange network
• crucial quantities:
![Page 64: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/64.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
– static network topologies:
* bus, ring, grid, or torus
* binary tree or fat tree
* hypercube
– dynamic network topologies:
* crossbar switch
* shuffle exchange network
• crucial quantities:
– diameter (longest path between two processors)
![Page 65: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/65.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
– static network topologies:
* bus, ring, grid, or torus
* binary tree or fat tree
* hypercube
– dynamic network topologies:
* crossbar switch
* shuffle exchange network
• crucial quantities:
– diameter (longest path between two processors)
– number of network connections (ports) per processor
![Page 66: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/66.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
– static network topologies:
* bus, ring, grid, or torus
* binary tree or fat tree
* hypercube
– dynamic network topologies:
* crossbar switch
* shuffle exchange network
• crucial quantities:
– diameter (longest path between two processors)
– number of network connections (ports) per processor
– parallel communications possible?
![Page 67: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/67.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 7 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
7. Parallel Computers – Topologies
• parallel computers – distributed systems: frontier?
• different possibilities of arrangement:
– static network topologies:
* bus, ring, grid, or torus
* binary tree or fat tree
* hypercube
– dynamic network topologies:
* crossbar switch
* shuffle exchange network
• crucial quantities:
– diameter (longest path between two processors)
– number of network connections (ports) per processor
– parallel communications possible?
– existence of bottlenecks?
![Page 68: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/68.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 8 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
8. Flynn’s Classification (1972)
![Page 69: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/69.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 8 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
8. Flynn’s Classification (1972)
• SISD: Single Instruction Single Data
– classical von-Neumann monoprocessor
![Page 70: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/70.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 8 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
8. Flynn’s Classification (1972)
• SISD: Single Instruction Single Data
– classical von-Neumann monoprocessor
• SIMD : Single Instruction Multiple Data
– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )
![Page 71: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/71.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 8 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
8. Flynn’s Classification (1972)
• SISD: Single Instruction Single Data
– classical von-Neumann monoprocessor
• SIMD : Single Instruction Multiple Data
– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )
– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)
![Page 72: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/72.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 8 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
8. Flynn’s Classification (1972)
• SISD: Single Instruction Single Data
– classical von-Neumann monoprocessor
• SIMD : Single Instruction Multiple Data
– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )
– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)
• MIMD : Multiple Instruction Multiple Data
– multiprocessors:
* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or
![Page 73: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/73.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 8 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
8. Flynn’s Classification (1972)
• SISD: Single Instruction Single Data
– classical von-Neumann monoprocessor
• SIMD : Single Instruction Multiple Data
– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )
– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)
• MIMD : Multiple Instruction Multiple Data
– multiprocessors:
* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or
* shared memory(tight coupling, global address space, im-plicit communication; most workstation servers) or
![Page 74: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/74.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 8 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
8. Flynn’s Classification (1972)
• SISD: Single Instruction Single Data
– classical von-Neumann monoprocessor
• SIMD : Single Instruction Multiple Data
– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )
– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)
• MIMD : Multiple Instruction Multiple Data
– multiprocessors:
* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or
* shared memory(tight coupling, global address space, im-plicit communication; most workstation servers) or
* nets/clusters
![Page 75: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/75.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 8 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
8. Flynn’s Classification (1972)
• SISD: Single Instruction Single Data
– classical von-Neumann monoprocessor
• SIMD : Single Instruction Multiple Data
– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )
– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)
• MIMD : Multiple Instruction Multiple Data
– multiprocessors:
* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or
* shared memory(tight coupling, global address space, im-plicit communication; most workstation servers) or
* nets/clusters
• MISD : Multiple Instruction Single Data: rare
![Page 76: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/76.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 9 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
9. Memory Access Classification
![Page 77: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/77.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 9 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
9. Memory Access Classification
• other criteria for classification:
scalability (S), programming model (PM), portability (P), and loaddistribution (L)
![Page 78: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/78.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 9 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
9. Memory Access Classification
• other criteria for classification:
scalability (S), programming model (PM), portability (P), and loaddistribution (L)
• UMA : Uniform Memory Access
– shared memory systems: SMP (symmetric multiprocessors,parallel vector processors); PC- and WS-servers, CRAYYMP
– advantage: P, PM, L; drawback: S
![Page 79: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/79.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 9 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
9. Memory Access Classification
• other criteria for classification:
scalability (S), programming model (PM), portability (P), and loaddistribution (L)
• UMA : Uniform Memory Access
– shared memory systems: SMP (symmetric multiprocessors,parallel vector processors); PC- and WS-servers, CRAYYMP
– advantage: P, PM, L; drawback: S
• NORMA : No Remote Memory Access
– distributed memory systems; clusters, IBM SP-2, iPSC/860
– advantage: S; drawback: P, PM, L
![Page 80: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/80.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 9 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
9. Memory Access Classification
• other criteria for classification:
scalability (S), programming model (PM), portability (P), and loaddistribution (L)
• UMA : Uniform Memory Access
– shared memory systems: SMP (symmetric multiprocessors,parallel vector processors); PC- and WS-servers, CRAYYMP
– advantage: P, PM, L; drawback: S
• NORMA : No Remote Memory Access
– distributed memory systems; clusters, IBM SP-2, iPSC/860
– advantage: S; drawback: P, PM, L
• NUMA : Non-Uniform Memory Access
– systems with virtually shared memory; KSR-1, CRAY T3D/T3E,CONVEX SPP
– Advantage: PM, S, P; drawback: cache-coherence, com-mun.
![Page 81: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/81.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 10 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
10. Parallelization
![Page 82: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/82.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 10 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
10. Parallelization
• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:
![Page 83: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/83.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 10 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
10. Parallelization
• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:
– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)
![Page 84: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/84.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 10 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
10. Parallelization
• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:
– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)
– logical/relational: PROLOG
![Page 85: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/85.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 10 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
10. Parallelization
• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:
– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)
– logical/relational: PROLOG
– object-oriented: SMALLTALK
![Page 86: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/86.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 10 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
10. Parallelization
• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:
– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)
– logical/relational: PROLOG
– object-oriented: SMALLTALK
– functional/applicative: LISP
![Page 87: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/87.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 10 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
10. Parallelization
• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:
– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)
– logical/relational: PROLOG
– object-oriented: SMALLTALK
– functional/applicative: LISP
• implicit parallelization typically via special compilers
![Page 88: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/88.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 10 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
10. Parallelization
• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:
– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)
– logical/relational: PROLOG
– object-oriented: SMALLTALK
– functional/applicative: LISP
• implicit parallelization typically via special compilers
• explicit parallelization typically via linked communication libraries
![Page 89: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/89.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 10 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
10. Parallelization
• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:
– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)
– logical/relational: PROLOG
– object-oriented: SMALLTALK
– functional/applicative: LISP
• implicit parallelization typically via special compilers
• explicit parallelization typically via linked communication libraries
• traditional way in Scientific Computing: FORTRAN code,vectorizing compiler, CRAY, wait for results
![Page 90: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/90.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 10 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
10. Parallelization
• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:
– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)
– logical/relational: PROLOG
– object-oriented: SMALLTALK
– functional/applicative: LISP
• implicit parallelization typically via special compilers
• explicit parallelization typically via linked communication libraries
• traditional way in Scientific Computing: FORTRAN code,vectorizing compiler, CRAY, wait for results
• explicit parallelization often difficult (cf. Gauß-Seidel), this makesnon-conventional approaches attractive
![Page 91: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/91.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 11 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
11. The Programming Model MPI
• How to write parallel programs?
![Page 92: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/92.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 11 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
11. The Programming Model MPI
• How to write parallel programs?
– UMA systems: simple answer – just as sequential ones
![Page 93: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/93.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 11 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
11. The Programming Model MPI
• How to write parallel programs?
– UMA systems: simple answer – just as sequential ones
– distributed memory systems: MPI model or standard
![Page 94: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/94.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 11 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
11. The Programming Model MPI
• How to write parallel programs?
– UMA systems: simple answer – just as sequential ones
– distributed memory systems: MPI model or standard
* Message Passing Interface
![Page 95: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/95.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 11 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
11. The Programming Model MPI
• How to write parallel programs?
– UMA systems: simple answer – just as sequential ones
– distributed memory systems: MPI model or standard
* Message Passing Interface
* originally for clusters, today used even on massivelyparallel computers, too
![Page 96: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/96.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 11 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
11. The Programming Model MPI
• How to write parallel programs?
– UMA systems: simple answer – just as sequential ones
– distributed memory systems: MPI model or standard
* Message Passing Interface
* originally for clusters, today used even on massivelyparallel computers, too
* MPI-1 developed 1992-1994
![Page 97: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/97.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 11 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
11. The Programming Model MPI
• How to write parallel programs?
– UMA systems: simple answer – just as sequential ones
– distributed memory systems: MPI model or standard
* Message Passing Interface
* originally for clusters, today used even on massivelyparallel computers, too
* MPI-1 developed 1992-1994
* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing
![Page 98: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/98.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 11 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
11. The Programming Model MPI
• How to write parallel programs?
– UMA systems: simple answer – just as sequential ones
– distributed memory systems: MPI model or standard
* Message Passing Interface
* originally for clusters, today used even on massivelyparallel computers, too
* MPI-1 developed 1992-1994
* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing
• MPI Features:
– parallel program: n processes, separate address spaces,no remote access
![Page 99: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/99.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 11 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
11. The Programming Model MPI
• How to write parallel programs?
– UMA systems: simple answer – just as sequential ones
– distributed memory systems: MPI model or standard
* Message Passing Interface
* originally for clusters, today used even on massivelyparallel computers, too
* MPI-1 developed 1992-1994
* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing
• MPI Features:
– parallel program: n processes, separate address spaces,no remote access
– message exchange via system calls sendand receive
![Page 100: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/100.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 11 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
11. The Programming Model MPI
• How to write parallel programs?
– UMA systems: simple answer – just as sequential ones
– distributed memory systems: MPI model or standard
* Message Passing Interface
* originally for clusters, today used even on massivelyparallel computers, too
* MPI-1 developed 1992-1994
* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing
• MPI Features:
– parallel program: n processes, separate address spaces,no remote access
– message exchange via system calls sendand receive
– MPI-kernel: library of communication routines, allowing tointegrate MPI commands into standard languages
![Page 101: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/101.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 12 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
12. MPI Messages
• messages consist of a
– header (recipient, buffer, type, context of communication)and of their
– body(contents)
![Page 102: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/102.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 12 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
12. MPI Messages
• messages consist of a
– header (recipient, buffer, type, context of communication)and of their
– body(contents)
• messages are buffered (send buffer, receive buffer)
![Page 103: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/103.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 12 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
12. MPI Messages
• messages consist of a
– header (recipient, buffer, type, context of communication)and of their
– body(contents)
• messages are buffered (send buffer, receive buffer)
• sending a message can be
– blocking(finished only after message has left node) or
![Page 104: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/104.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 12 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
12. MPI Messages
• messages consist of a
– header (recipient, buffer, type, context of communication)and of their
– body(contents)
• messages are buffered (send buffer, receive buffer)
• sending a message can be
– blocking(finished only after message has left node) or
– non-blocking(finished immediately, message may be sentlater)
![Page 105: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/105.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 12 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
12. MPI Messages
• messages consist of a
– header (recipient, buffer, type, context of communication)and of their
– body(contents)
• messages are buffered (send buffer, receive buffer)
• sending a message can be
– blocking(finished only after message has left node) or
– non-blocking(finished immediately, message may be sentlater)
• the same holds for receiving a message:
– blocking: waiting;
![Page 106: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/106.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 12 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
12. MPI Messages
• messages consist of a
– header (recipient, buffer, type, context of communication)and of their
– body(contents)
• messages are buffered (send buffer, receive buffer)
• sending a message can be
– blocking(finished only after message has left node) or
– non-blocking(finished immediately, message may be sentlater)
• the same holds for receiving a message:
– blocking: waiting;
– non-blocking: looking for it from time to time
![Page 107: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/107.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 12 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
12. MPI Messages
• messages consist of a
– header (recipient, buffer, type, context of communication)and of their
– body(contents)
• messages are buffered (send buffer, receive buffer)
• sending a message can be
– blocking(finished only after message has left node) or
– non-blocking(finished immediately, message may be sentlater)
• the same holds for receiving a message:
– blocking: waiting;
– non-blocking: looking for it from time to time
cost of passing a message (length N, buffer cap. K):
t(N) = α · NK
+ β ·Ninitializing cost/time α, transportation cost β
![Page 108: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/108.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 13 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
13. Programming with MPI
![Page 109: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/109.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 13 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
13. Programming with MPI
• a simple example:
P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again
![Page 110: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/110.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 13 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
13. Programming with MPI
• a simple example:
P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again
• without buffering: deadlocks possible
![Page 111: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/111.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 13 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
13. Programming with MPI
• a simple example:
P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again
• without buffering: deadlocks possible
– nothing specified: buffering possible, but not imperative
![Page 112: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/112.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 13 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
13. Programming with MPI
• a simple example:
P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again
• without buffering: deadlocks possible
– nothing specified: buffering possible, but not imperative
– never: no buffering (efficient, but risky)
![Page 113: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/113.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 13 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
13. Programming with MPI
• a simple example:
P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again
• without buffering: deadlocks possible
– nothing specified: buffering possible, but not imperative
– never: no buffering (efficient, but risky)
– always: secure, but sometimes costly
![Page 114: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/114.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 13 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
13. Programming with MPI
• a simple example:
P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again
• without buffering: deadlocks possible
– nothing specified: buffering possible, but not imperative
– never: no buffering (efficient, but risky)
– always: secure, but sometimes costly
• collective communication features available:
– broadcast, gather, gather-to-all, scatter, all-to-all,. . .
![Page 115: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/115.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
![Page 116: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/116.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
![Page 117: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/117.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
![Page 118: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/118.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
– strategy: load balancingor load distribution or scheduling
![Page 119: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/119.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
– strategy: load balancingor load distribution or scheduling
– important: avoid overhead
![Page 120: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/120.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
– strategy: load balancingor load distribution or scheduling
– important: avoid overhead
• one distinguishes
![Page 121: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/121.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
– strategy: load balancingor load distribution or scheduling
– important: avoid overhead
• one distinguishes
– scheduling:
* global: where do which processes run?
![Page 122: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/122.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
– strategy: load balancingor load distribution or scheduling
– important: avoid overhead
• one distinguishes
– scheduling:
* global: where do which processes run?
* local: when does which processor which process
![Page 123: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/123.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
– strategy: load balancingor load distribution or scheduling
– important: avoid overhead
• one distinguishes
– scheduling:
* global: where do which processes run?
* local: when does which processor which process
– load balancing:
* static: a priori
![Page 124: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/124.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
– strategy: load balancingor load distribution or scheduling
– important: avoid overhead
• one distinguishes
– scheduling:
* global: where do which processes run?
* local: when does which processor which process
– load balancing:
* static: a priori
* dynamic: during runtime
![Page 125: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/125.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
– strategy: load balancingor load distribution or scheduling
– important: avoid overhead
• one distinguishes
– scheduling:
* global: where do which processes run?
* local: when does which processor which process
– load balancing:
* static: a priori
* dynamic: during runtime
• in Scientific Computing applications load is often not predictable:
– adaptive refinement of a finite element mesh,
![Page 126: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/126.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
– strategy: load balancingor load distribution or scheduling
– important: avoid overhead
• one distinguishes
– scheduling:
* global: where do which processes run?
* local: when does which processor which process
– load balancing:
* static: a priori
* dynamic: during runtime
• in Scientific Computing applications load is often not predictable:
– adaptive refinement of a finite element mesh,
– convergence behaviour of iterations may differ
![Page 127: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/127.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 14 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
14. Load Distribution
• load: amount of work on processors
– optimum: minimize idle times; needs estimates and moni-toring
– strategy: load balancingor load distribution or scheduling
– important: avoid overhead
• one distinguishes
– scheduling:
* global: where do which processes run?
* local: when does which processor which process
– load balancing:
* static: a priori
* dynamic: during runtime
• in Scientific Computing applications load is often not predictable:
– adaptive refinement of a finite element mesh,
– convergence behaviour of iterations may differ
– thus: static load balancing not sufficient
![Page 128: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/128.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
![Page 129: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/129.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
• Which are the primary objectives?
– optimization of system loador application runtime?
![Page 130: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/130.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
• Which are the primary objectives?
– optimization of system loador application runtime?
– placementof new processes or migration of running pro-cesses?
![Page 131: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/131.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
• Which are the primary objectives?
– optimization of system loador application runtime?
– placementof new processes or migration of running pro-cesses?
• Which is the level of integration?
– Who initiates actions (measure load, chose strategy)?
* application program
![Page 132: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/132.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
• Which are the primary objectives?
– optimization of system loador application runtime?
– placementof new processes or migration of running pro-cesses?
• Which is the level of integration?
– Who initiates actions (measure load, chose strategy)?
* application program
* runtime system
![Page 133: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/133.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
• Which are the primary objectives?
– optimization of system loador application runtime?
– placementof new processes or migration of running pro-cesses?
• Which is the level of integration?
– Who initiates actions (measure load, chose strategy)?
* application program
* runtime system
* OS?
![Page 134: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/134.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
• Which are the primary objectives?
– optimization of system loador application runtime?
– placementof new processes or migration of running pro-cesses?
• Which is the level of integration?
– Who initiates actions (measure load, chose strategy)?
* application program
* runtime system
* OS?
• Any special features of the application to be considered?
– restrictions in allocation process-to-processor frequent inS.C.
![Page 135: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/135.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
• Which are the primary objectives?
– optimization of system loador application runtime?
– placementof new processes or migration of running pro-cesses?
• Which is the level of integration?
– Who initiates actions (measure load, chose strategy)?
* application program
* runtime system
* OS?
• Any special features of the application to be considered?
– restrictions in allocation process-to-processor frequent inS.C.
• Which units shall be distributed or displaced?
![Page 136: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/136.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
• Which are the primary objectives?
– optimization of system loador application runtime?
– placementof new processes or migration of running pro-cesses?
• Which is the level of integration?
– Who initiates actions (measure load, chose strategy)?
* application program
* runtime system
* OS?
• Any special features of the application to be considered?
– restrictions in allocation process-to-processor frequent inS.C.
• Which units shall be distributed or displaced?
– whole processes (coarse grain)
![Page 137: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/137.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
• Which are the primary objectives?
– optimization of system loador application runtime?
– placementof new processes or migration of running pro-cesses?
• Which is the level of integration?
– Who initiates actions (measure load, chose strategy)?
* application program
* runtime system
* OS?
• Any special features of the application to be considered?
– restrictions in allocation process-to-processor frequent inS.C.
• Which units shall be distributed or displaced?
– whole processes (coarse grain)
– threads (fine grain)
![Page 138: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/138.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 15 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
15. Designing Load Distribution
• Which are the primary objectives?
– optimization of system loador application runtime?
– placementof new processes or migration of running pro-cesses?
• Which is the level of integration?
– Who initiates actions (measure load, chose strategy)?
* application program
* runtime system
* OS?
• Any special features of the application to be considered?
– restrictions in allocation process-to-processor frequent inS.C.
• Which units shall be distributed or displaced?
– whole processes (coarse grain)
– threads (fine grain)
– objects or data (typical for simulation applications)
![Page 139: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/139.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 16 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
16. Classification of Strategies
![Page 140: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/140.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 16 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
16. Classification of Strategies
• origin of the idea:
from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)
![Page 141: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/141.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 16 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
16. Classification of Strategies
• origin of the idea:
from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)
• for networks, for bus topologies
![Page 142: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/142.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 16 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
16. Classification of Strategies
• origin of the idea:
from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)
• for networks, for bus topologies
• data represented as grids, trees, sets, or . . .
![Page 143: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/143.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 16 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
16. Classification of Strategies
• origin of the idea:
from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)
• for networks, for bus topologies
• data represented as grids, trees, sets, or . . .
• distribution mechanisms:
![Page 144: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/144.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 16 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
16. Classification of Strategies
• origin of the idea:
from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)
• for networks, for bus topologies
• data represented as grids, trees, sets, or . . .
• distribution mechanisms:
– load handed over to neighbouring nodes only?
![Page 145: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/145.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 16 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
16. Classification of Strategies
• origin of the idea:
from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)
• for networks, for bus topologies
• data represented as grids, trees, sets, or . . .
• distribution mechanisms:
– load handed over to neighbouring nodes only?
– just distribution of new units or migration of running ones(how?)?
![Page 146: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/146.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 16 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
16. Classification of Strategies
• origin of the idea:
from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)
• for networks, for bus topologies
• data represented as grids, trees, sets, or . . .
• distribution mechanisms:
– load handed over to neighbouring nodes only?
– just distribution of new units or migration of running ones(how?)?
• flow of information:
to whom is load communicated, from where comes information?
![Page 147: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/147.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 16 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
16. Classification of Strategies
• origin of the idea:
from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)
• for networks, for bus topologies
• data represented as grids, trees, sets, or . . .
• distribution mechanisms:
– load handed over to neighbouring nodes only?
– just distribution of new units or migration of running ones(how?)?
• flow of information:
to whom is load communicated, from where comes information?
• coordination:
who makes decisions? autonomous/cooperative/competitive?
![Page 148: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/148.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 16 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
16. Classification of Strategies
• origin of the idea:
from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)
• for networks, for bus topologies
• data represented as grids, trees, sets, or . . .
• distribution mechanisms:
– load handed over to neighbouring nodes only?
– just distribution of new units or migration of running ones(how?)?
• flow of information:
to whom is load communicated, from where comes information?
• coordination:
who makes decisions? autonomous/cooperative/competitive?
• algorithms:
who initiates measures? adaptivity? costs relevant? evalua-tion?
![Page 149: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/149.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 17 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
17. Examples of LD-Strategies
![Page 150: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/150.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 17 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
17. Examples of LD-Strategies
• diffusion model:
permanent balancing process between neighbours
![Page 151: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/151.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 17 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
17. Examples of LD-Strategies
• diffusion model:
permanent balancing process between neighbours
• bidding model:
supply and demand, establishment of some market
![Page 152: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/152.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 17 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
17. Examples of LD-Strategies
• diffusion model:
permanent balancing process between neighbours
• bidding model:
supply and demand, establishment of some market
• broker model:
– esp. for heterogeneous hierarchical topologies, scalable
![Page 153: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/153.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 17 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
17. Examples of LD-Strategies
• diffusion model:
permanent balancing process between neighbours
• bidding model:
supply and demand, establishment of some market
• broker model:
– esp. for heterogeneous hierarchical topologies, scalable
– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers
![Page 154: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/154.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 17 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
17. Examples of LD-Strategies
• diffusion model:
permanent balancing process between neighbours
• bidding model:
supply and demand, establishment of some market
• broker model:
– esp. for heterogeneous hierarchical topologies, scalable
– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers
– prices for use of resources and brokerage
![Page 155: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/155.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 17 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
17. Examples of LD-Strategies
• diffusion model:
permanent balancing process between neighbours
• bidding model:
supply and demand, establishment of some market
• broker model:
– esp. for heterogeneous hierarchical topologies, scalable
– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers
– prices for use of resources and brokerage
• matching model:
construct matching in topology graph, balance along edges
![Page 156: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/156.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 17 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
17. Examples of LD-Strategies
• diffusion model:
permanent balancing process between neighbours
• bidding model:
supply and demand, establishment of some market
• broker model:
– esp. for heterogeneous hierarchical topologies, scalable
– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers
– prices for use of resources and brokerage
• matching model:
construct matching in topology graph, balance along edges
• balanced allocation, space-filling curves, . . .
![Page 157: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/157.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 18 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
18. Performance Evaluation
• performance evaluation of algortihms and computers
• average parallelism(for p processors):
A(p) =sum of processor runtimes
parallel runtime
![Page 158: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/158.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 18 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
18. Performance Evaluation
• performance evaluation of algortihms and computers
• average parallelism(for p processors):
A(p) =sum of processor runtimes
parallel runtime
• speedup S: S =sequential runtime
parallel runtime
![Page 159: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/159.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 18 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
18. Performance Evaluation
• performance evaluation of algortihms and computers
• average parallelism(for p processors):
A(p) =sum of processor runtimes
parallel runtime
• speedup S: S =sequential runtime
parallel runtime
• efficiency E: E = Sp
![Page 160: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/160.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 18 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
18. Performance Evaluation
• performance evaluation of algortihms and computers
• average parallelism(for p processors):
A(p) =sum of processor runtimes
parallel runtime
• speedup S: S =sequential runtime
parallel runtime
• efficiency E: E = Sp
• Amdahl’s Law :
assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way
S ≤ 1
seq+ 1−seqp
≤ 1seq
![Page 161: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/161.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 18 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
18. Performance Evaluation
• performance evaluation of algortihms and computers
• average parallelism(for p processors):
A(p) =sum of processor runtimes
parallel runtime
• speedup S: S =sequential runtime
parallel runtime
• efficiency E: E = Sp
• Amdahl’s Law :
assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way
S ≤ 1
seq+ 1−seqp
≤ 1seq
• another important quantity: CCR (Communication-to-ComputationRatio)
![Page 162: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/162.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 18 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
18. Performance Evaluation
• performance evaluation of algortihms and computers
• average parallelism(for p processors):
A(p) =sum of processor runtimes
parallel runtime
• speedup S: S =sequential runtime
parallel runtime
• efficiency E: E = Sp
• Amdahl’s Law :
assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way
S ≤ 1
seq+ 1−seqp
≤ 1seq
• another important quantity: CCR (Communication-to-ComputationRatio)
– CCR often increases with increasing p and constant prob-lem size (example: iterative methods for Ax = b)
![Page 163: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:](https://reader031.vdocuments.us/reader031/viewer/2022022602/5b5705fb7f8b9adf7d8d44bd/html5/thumbnails/163.jpg)
Implementation: . . .
RISC Technology
Pipelining
Superscalar Processors
Cache Memory
Memory Hierarchy
Parallel Computers – . . .
Flynn’s Classification . . .
Memory Access . . .
Parallelization
The Programming . . .
MPI Messages
Programming with MPI
Load Distribution
Designing Load . . .
Classification of . . .
Examples of LD- . . .
Performance Evaluation
Page 18 of 18
Introduction to Scientific Computing
9. ImplementationMiriam Mehl
18. Performance Evaluation
• performance evaluation of algortihms and computers
• average parallelism(for p processors):
A(p) =sum of processor runtimes
parallel runtime
• speedup S: S =sequential runtime
parallel runtime
• efficiency E: E = Sp
• Amdahl’s Law :assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way
S ≤ 1
seq+ 1−seqp
≤ 1seq
• another important quantity: CCR (Communication-to-ComputationRatio)
– CCR often increases with increasing p and constant prob-lem size (example: iterative methods for Ax = b)
– therefore: do not compare speedups for different p, butsame problem size