1. implementation: target architectures · implementation:... risc technology ... • different...

Post on 24-Jul-2018

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

* RISC (Reduced Instruction Set Computer) technology

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

* RISC (Reduced Instruction Set Computer) technology

* well-developed pipelining

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

* RISC (Reduced Instruction Set Computer) technology

* well-developed pipelining

* superscalarprocessor organization

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

* RISC (Reduced Instruction Set Computer) technology

* well-developed pipelining

* superscalarprocessor organization

* cachingand multi-level memory hierarchy

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

* RISC (Reduced Instruction Set Computer) technology

* well-developed pipelining

* superscalarprocessor organization

* cachingand multi-level memory hierarchy

* VLIW, Multi Thread Architecture, On-chip multiproces-sors, ...

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 2 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

2. RISC Technology

• counter-trend to CISC: more and more complex instructions en-tailing microprogramming

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 2 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

2. RISC Technology

• counter-trend to CISC: more and more complex instructions en-tailing microprogramming

• now instead:

– relatively small number of instructions (tens)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 2 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

2. RISC Technology

• counter-trend to CISC: more and more complex instructions en-tailing microprogramming

• now instead:

– relatively small number of instructions (tens)

– simple machine instructions, fixed format, few address modes

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 2 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

2. RISC Technology

• counter-trend to CISC: more and more complex instructions en-tailing microprogramming

• now instead:

– relatively small number of instructions (tens)

– simple machine instructions, fixed format, few address modes

– load-and-storeprinciple: only explicit LOAD/WRITE instruc-tions have memory access

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 2 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

2. RISC Technology

• counter-trend to CISC: more and more complex instructions en-tailing microprogramming

• now instead:

– relatively small number of instructions (tens)

– simple machine instructions, fixed format, few address modes

– load-and-storeprinciple: only explicit LOAD/WRITE instruc-tions have memory access

– no more need for microprogramming

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

– execute,

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

– execute,

– write results

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

– execute,

– write results

• further improvement: reorder steps of an instruction (LOAD asearly as possible, WRITE as late as possible: avoids risk of idlewaiting time)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

– execute,

– write results

• further improvement: reorder steps of an instruction (LOAD asearly as possible, WRITE as late as possible: avoids risk of idlewaiting time)

• best case: identical instructions to be pipelined/overlapped, asin vector processors

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

– execute,

– write results

• further improvement: reorder steps of an instruction (LOAD asearly as possible, WRITE as late as possible: avoids risk of idlewaiting time)

• best case: identical instructions to be pipelined/overlapped, asin vector processors

• pipelining needs different functional units in the CPU that candeal with the different steps in parallel; therefore:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 4 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

4. Superscalar Processors

• several parts of the CPU are available in more than 1 copy

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 4 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

4. Superscalar Processors

• several parts of the CPU are available in more than 1 copy

• example: MIPS R10000 has 5 execution pipelines

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 4 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

4. Superscalar Processors

• several parts of the CPU are available in more than 1 copy

• example: MIPS R10000 has 5 execution pipelines

– one for FP-multiplication, one for FP-addition

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 4 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

4. Superscalar Processors

• several parts of the CPU are available in more than 1 copy

• example: MIPS R10000 has 5 execution pipelines

– one for FP-multiplication, one for FP-addition

– two integer ALU (arithmetic-logical units)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 4 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

4. Superscalar Processors

• several parts of the CPU are available in more than 1 copy

• example: MIPS R10000 has 5 execution pipelines

– one for FP-multiplication, one for FP-addition

– two integer ALU (arithmetic-logical units)

– one address pipeline

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

– choice of section: what to be kept in cache?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

– choice of section: what to be kept in cache?

– ensure locality of data (instructions in cache need data incache)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

– choice of section: what to be kept in cache?

– ensure locality of data (instructions in cache need data incache)

– strategies for fetching, replacement, and updating

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

– choice of section: what to be kept in cache?

– ensure locality of data (instructions in cache need data incache)

– strategies for fetching, replacement, and updating

– association: how to check whether data are available incache?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

– choice of section: what to be kept in cache?

– ensure locality of data (instructions in cache need data incache)

– strategies for fetching, replacement, and updating

– association: how to check whether data are available incache?

– consistency: no different versions in cache and main mem-ory

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

– example: matrix-vector product Ax with A too large for cache

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

– example: matrix-vector product Ax with A too large for cache

– standard algorithm:

* outer loop over rows of A,

* inner loop for scalar product of one row of A with x

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

– example: matrix-vector product Ax with A too large for cache

– standard algorithm:

* outer loop over rows of A,

* inner loop for scalar product of one row of A with x

– if current contents of cache are some rows of A, it’s OK

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

– example: matrix-vector product Ax with A too large for cache

– standard algorithm:

* outer loop over rows of A,

* inner loop for scalar product of one row of A with x

– if current contents of cache are some rows of A, it’s OK

– if current contents of cache are some columns of A: slow!

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,– (level-1/2/3) cache,– main memory,– hard disk,– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

– example: matrix-vector product Ax with A too large for cache– standard algorithm:

* outer loop over rows of A,

* inner loop for scalar product of one row of A with x

– if current contents of cache are some rows of A, it’s OK– if current contents of cache are some columns of A: slow!– tuning crucial: peak performance up to 4 orders of magni-

tude higher than performance observed in practice (withouttuning)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

• crucial quantities:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

• crucial quantities:

– diameter (longest path between two processors)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

• crucial quantities:

– diameter (longest path between two processors)

– number of network connections (ports) per processor

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

• crucial quantities:

– diameter (longest path between two processors)

– number of network connections (ports) per processor

– parallel communications possible?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

• crucial quantities:

– diameter (longest path between two processors)

– number of network connections (ports) per processor

– parallel communications possible?

– existence of bottlenecks?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)

• MIMD : Multiple Instruction Multiple Data

– multiprocessors:

* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)

• MIMD : Multiple Instruction Multiple Data

– multiprocessors:

* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or

* shared memory(tight coupling, global address space, im-plicit communication; most workstation servers) or

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)

• MIMD : Multiple Instruction Multiple Data

– multiprocessors:

* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or

* shared memory(tight coupling, global address space, im-plicit communication; most workstation servers) or

* nets/clusters

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)

• MIMD : Multiple Instruction Multiple Data

– multiprocessors:

* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or

* shared memory(tight coupling, global address space, im-plicit communication; most workstation servers) or

* nets/clusters

• MISD : Multiple Instruction Single Data: rare

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 9 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

9. Memory Access Classification

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 9 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

9. Memory Access Classification

• other criteria for classification:

scalability (S), programming model (PM), portability (P), and loaddistribution (L)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 9 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

9. Memory Access Classification

• other criteria for classification:

scalability (S), programming model (PM), portability (P), and loaddistribution (L)

• UMA : Uniform Memory Access

– shared memory systems: SMP (symmetric multiprocessors,parallel vector processors); PC- and WS-servers, CRAYYMP

– advantage: P, PM, L; drawback: S

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 9 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

9. Memory Access Classification

• other criteria for classification:

scalability (S), programming model (PM), portability (P), and loaddistribution (L)

• UMA : Uniform Memory Access

– shared memory systems: SMP (symmetric multiprocessors,parallel vector processors); PC- and WS-servers, CRAYYMP

– advantage: P, PM, L; drawback: S

• NORMA : No Remote Memory Access

– distributed memory systems; clusters, IBM SP-2, iPSC/860

– advantage: S; drawback: P, PM, L

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 9 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

9. Memory Access Classification

• other criteria for classification:

scalability (S), programming model (PM), portability (P), and loaddistribution (L)

• UMA : Uniform Memory Access

– shared memory systems: SMP (symmetric multiprocessors,parallel vector processors); PC- and WS-servers, CRAYYMP

– advantage: P, PM, L; drawback: S

• NORMA : No Remote Memory Access

– distributed memory systems; clusters, IBM SP-2, iPSC/860

– advantage: S; drawback: P, PM, L

• NUMA : Non-Uniform Memory Access

– systems with virtually shared memory; KSR-1, CRAY T3D/T3E,CONVEX SPP

– Advantage: PM, S, P; drawback: cache-coherence, com-mun.

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

– functional/applicative: LISP

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

– functional/applicative: LISP

• implicit parallelization typically via special compilers

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

– functional/applicative: LISP

• implicit parallelization typically via special compilers

• explicit parallelization typically via linked communication libraries

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

– functional/applicative: LISP

• implicit parallelization typically via special compilers

• explicit parallelization typically via linked communication libraries

• traditional way in Scientific Computing: FORTRAN code,vectorizing compiler, CRAY, wait for results

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

– functional/applicative: LISP

• implicit parallelization typically via special compilers

• explicit parallelization typically via linked communication libraries

• traditional way in Scientific Computing: FORTRAN code,vectorizing compiler, CRAY, wait for results

• explicit parallelization often difficult (cf. Gauß-Seidel), this makesnon-conventional approaches attractive

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

* MPI-1 developed 1992-1994

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

* MPI-1 developed 1992-1994

* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

* MPI-1 developed 1992-1994

* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing

• MPI Features:

– parallel program: n processes, separate address spaces,no remote access

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

* MPI-1 developed 1992-1994

* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing

• MPI Features:

– parallel program: n processes, separate address spaces,no remote access

– message exchange via system calls sendand receive

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

* MPI-1 developed 1992-1994

* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing

• MPI Features:

– parallel program: n processes, separate address spaces,no remote access

– message exchange via system calls sendand receive

– MPI-kernel: library of communication routines, allowing tointegrate MPI commands into standard languages

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

• sending a message can be

– blocking(finished only after message has left node) or

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

• sending a message can be

– blocking(finished only after message has left node) or

– non-blocking(finished immediately, message may be sentlater)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

• sending a message can be

– blocking(finished only after message has left node) or

– non-blocking(finished immediately, message may be sentlater)

• the same holds for receiving a message:

– blocking: waiting;

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

• sending a message can be

– blocking(finished only after message has left node) or

– non-blocking(finished immediately, message may be sentlater)

• the same holds for receiving a message:

– blocking: waiting;

– non-blocking: looking for it from time to time

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

• sending a message can be

– blocking(finished only after message has left node) or

– non-blocking(finished immediately, message may be sentlater)

• the same holds for receiving a message:

– blocking: waiting;

– non-blocking: looking for it from time to time

cost of passing a message (length N, buffer cap. K):

t(N) = α · NK

+ β ·Ninitializing cost/time α, transportation cost β

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

• without buffering: deadlocks possible

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

• without buffering: deadlocks possible

– nothing specified: buffering possible, but not imperative

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

• without buffering: deadlocks possible

– nothing specified: buffering possible, but not imperative

– never: no buffering (efficient, but risky)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

• without buffering: deadlocks possible

– nothing specified: buffering possible, but not imperative

– never: no buffering (efficient, but risky)

– always: secure, but sometimes costly

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

• without buffering: deadlocks possible

– nothing specified: buffering possible, but not imperative

– never: no buffering (efficient, but risky)

– always: secure, but sometimes costly

• collective communication features available:

– broadcast, gather, gather-to-all, scatter, all-to-all,. . .

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

– load balancing:

* static: a priori

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

– load balancing:

* static: a priori

* dynamic: during runtime

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

– load balancing:

* static: a priori

* dynamic: during runtime

• in Scientific Computing applications load is often not predictable:

– adaptive refinement of a finite element mesh,

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

– load balancing:

* static: a priori

* dynamic: during runtime

• in Scientific Computing applications load is often not predictable:

– adaptive refinement of a finite element mesh,

– convergence behaviour of iterations may differ

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

– load balancing:

* static: a priori

* dynamic: during runtime

• in Scientific Computing applications load is often not predictable:

– adaptive refinement of a finite element mesh,

– convergence behaviour of iterations may differ

– thus: static load balancing not sufficient

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

• Any special features of the application to be considered?

– restrictions in allocation process-to-processor frequent inS.C.

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

• Any special features of the application to be considered?

– restrictions in allocation process-to-processor frequent inS.C.

• Which units shall be distributed or displaced?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

• Any special features of the application to be considered?

– restrictions in allocation process-to-processor frequent inS.C.

• Which units shall be distributed or displaced?

– whole processes (coarse grain)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

• Any special features of the application to be considered?

– restrictions in allocation process-to-processor frequent inS.C.

• Which units shall be distributed or displaced?

– whole processes (coarse grain)

– threads (fine grain)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

• Any special features of the application to be considered?

– restrictions in allocation process-to-processor frequent inS.C.

• Which units shall be distributed or displaced?

– whole processes (coarse grain)

– threads (fine grain)

– objects or data (typical for simulation applications)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

– load handed over to neighbouring nodes only?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

– load handed over to neighbouring nodes only?

– just distribution of new units or migration of running ones(how?)?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

– load handed over to neighbouring nodes only?

– just distribution of new units or migration of running ones(how?)?

• flow of information:

to whom is load communicated, from where comes information?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

– load handed over to neighbouring nodes only?

– just distribution of new units or migration of running ones(how?)?

• flow of information:

to whom is load communicated, from where comes information?

• coordination:

who makes decisions? autonomous/cooperative/competitive?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

– load handed over to neighbouring nodes only?

– just distribution of new units or migration of running ones(how?)?

• flow of information:

to whom is load communicated, from where comes information?

• coordination:

who makes decisions? autonomous/cooperative/competitive?

• algorithms:

who initiates measures? adaptivity? costs relevant? evalua-tion?

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

• broker model:

– esp. for heterogeneous hierarchical topologies, scalable

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

• broker model:

– esp. for heterogeneous hierarchical topologies, scalable

– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

• broker model:

– esp. for heterogeneous hierarchical topologies, scalable

– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers

– prices for use of resources and brokerage

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

• broker model:

– esp. for heterogeneous hierarchical topologies, scalable

– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers

– prices for use of resources and brokerage

• matching model:

construct matching in topology graph, balance along edges

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

• broker model:

– esp. for heterogeneous hierarchical topologies, scalable

– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers

– prices for use of resources and brokerage

• matching model:

construct matching in topology graph, balance along edges

• balanced allocation, space-filling curves, . . .

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

• efficiency E: E = Sp

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

• efficiency E: E = Sp

• Amdahl’s Law :

assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way

S ≤ 1

seq+ 1−seqp

≤ 1seq

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

• efficiency E: E = Sp

• Amdahl’s Law :

assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way

S ≤ 1

seq+ 1−seqp

≤ 1seq

• another important quantity: CCR (Communication-to-ComputationRatio)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

• efficiency E: E = Sp

• Amdahl’s Law :

assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way

S ≤ 1

seq+ 1−seqp

≤ 1seq

• another important quantity: CCR (Communication-to-ComputationRatio)

– CCR often increases with increasing p and constant prob-lem size (example: iterative methods for Ax = b)

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

• efficiency E: E = Sp

• Amdahl’s Law :assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way

S ≤ 1

seq+ 1−seqp

≤ 1seq

• another important quantity: CCR (Communication-to-ComputationRatio)

– CCR often increases with increasing p and constant prob-lem size (example: iterative methods for Ax = b)

– therefore: do not compare speedups for different p, butsame problem size

top related