parallel computer models

Upload: joshua-duffy

Post on 03-Apr-2018

225 views

Category:

Documents


3 download

TRANSCRIPT

  • 7/28/2019 parallel computer Models

    1/27

    1

    Parallel Computer Models

    CEG 4131 Computer Architecture III

    Miodrag Bolic

  • 7/28/2019 parallel computer Models

    2/27

    2

    Overview

    Flynns taxonomy

    Classification based on the memory arrangement Classification based on communication

    Classification based on the kind of parallelism Data-parallel

    Function-parallel

  • 7/28/2019 parallel computer Models

    3/27

    3

    Flynns Taxonomy The most universally excepted method of classifying computer

    systems Published in the Proceedings of the IEEE in 1966

    Any computer can be placed in one of 4 broad categories

    SISD: Single instruction stream, single data stream

    SIMD: Single instruction stream, multiple data streams

    MIMD: Multiple instruction streams, multiple data streams

    MISD: Multiple instruction streams, single data stream

  • 7/28/2019 parallel computer Models

    4/27

    4

    SISD

    Processing

    element (PE)

    Main memory

    (M)

    Instructions

    Data

    Control Unit PE MemoryPE

    IS

    IS DS

  • 7/28/2019 parallel computer Models

    5/27

    5

    Applications:

    Image processing Matrix manipulations Sorting

    SIMD

  • 7/28/2019 parallel computer Models

    6/27

    6

    SIMD Architectures

    Fine-grained

    Image processing application Large number of PEs

    Minimum complexity PEs

    Programming language is a simple extension of a sequentiallanguage

    Coarse-grained Each PE is of higher complexity and it is usually built with

    commercial devices

    Each PE has local memory

  • 7/28/2019 parallel computer Models

    7/27

    7

    MIMD

  • 7/28/2019 parallel computer Models

    8/27

    8

    MISD

    Applications: Classification Robot vision

  • 7/28/2019 parallel computer Models

    9/27

    9

    Flynn taxonomy

    Advantages of Flynn

    Universally accepted

    Compact Notation

    Easy to classify a system (?)

    Disadvantages of Flynn

    Very coarse-grain differentiation among machine

    systems

    Comparison of different systems is limited Interconnections, I/O, memory not considered in the

    scheme

  • 7/28/2019 parallel computer Models

    10/27

    10

    Classification based on memory arrangement

    PE1 PEn

    Processors

    Interconnectionnetwork

    Shared memory

    Shared memory - multiprocessors

    I/O1

    I/OnPE1

    Interconnectionnetwork

    M1

    P1

    PEn

    Mn

    Pn

    Message passing - multicomputers

  • 7/28/2019 parallel computer Models

    11/27

    11

    Shared-memory multiprocessors

    Uniform Memory Access (UMA)

    Non-Uniform Memory Access (NUMA) Cache-only Memory Architecture (COMA)

    Memory is common to all the processors.

    Processors easily communicate by means of sharedvariables.

  • 7/28/2019 parallel computer Models

    12/27

    12

    The UMA Model

    Tightly-coupled systems (high degree of resource

    sharing) Suitable for general-purpose and time-sharing

    applications by multiple users.

    P1

    $

    Interconnection network

    $

    Pn

    Mem Mem

  • 7/28/2019 parallel computer Models

    13/27

    13

    Symmetric and asymmetric multiprocessors Symmetric:

    - all processors have equal access to all peripheraldevices.- all processors are identical.

    Asymmetric:

    - one processor (master) executes the operating system- other processors may be of different types and may bededicated to special tasks.

  • 7/28/2019 parallel computer Models

    14/27

    14

    The NUMA Model

    The access time varies with the location of the memory

    word. Shared memory is distributed to local memories. All local memories form a global address space

    accessible by all processors

    P1

    $

    Interconnection network

    $

    Pn

    Mem Mem

    Distributed Memory (NUMA)

    Access time: Cache, Local memory, Remote memory

    COMA - Cache-only Memory Architecture

  • 7/28/2019 parallel computer Models

    15/27

    15

    Distributed memory multicomputers

    Multiple computers- nodes

    Message-passing network Local memories are private with its

    own program and data

    No memory contention so that the

    number of processors is very large The processors are connected by

    communication lines, and the preciseway in which the lines are connected

    is called the topologyof themulticomputer.

    A typical program consists ofsubtasks residing in all thememories.

    PE

    Interconnectionnetwork

    M

    PE

    M

    PE

    M

    PE

    M

    PE

    M

    PE

    M

  • 7/28/2019 parallel computer Models

    16/27

    16

    Classification based on type of interconnections

    Static networks

    Dynamic networks

  • 7/28/2019 parallel computer Models

    17/27

    17

    Interconnection Network [1]

    Mode of Operation (Synchronous vs. Asynchronous)

    Control Strategy (Centralized vs. Decentralized)

    Switching Techniques (Packet switching vs. Circuit

    switching)

    Topology (Static Vs. Dynamic)

  • 7/28/2019 parallel computer Models

    18/27

    18

    Classification based on the kind of parallelism[3]

    Parallel architecturesPAs

    Data-parallel architectures Function-parallel architectures

    Instruction-level

    PAs

    Thread-level

    PAs

    Process-levelPAs

    ILPS MIMDs

    Vectorarchitecture

    Associative

    architecturearchitectureand neural

    SIMDs Systolic Pipelinedprocessors

    Processors)

    processorsVLIWsSuperscalar Ditributed

    memory

    (multi-computer)

    Sharedmemory(multi-MIMD

    DPs

  • 7/28/2019 parallel computer Models

    19/27

    19

    References

    1. Advanced Computer Architecture and Parallel

    Processing, by Hesham El-Rewini and Mostafa Abd-El-Barr, John Wiley and Sons, 2005.

    2. Advanced Computer Architecture Parallelism,

    Scalability, Programmability, by K. Hwang, McGraw-Hill

    1993.3. Advanced Computer Architectures A Design Space

    Approach by Desco Sima, Terence Fountain and PeterKascuk, Pearson, 1997.

    http://ca.wiley.com/WileyCDA/WileyTitle/productCd-0471467405.htmlhttp://ca.wiley.com/WileyCDA/WileyTitle/productCd-0471467405.html
  • 7/28/2019 parallel computer Models

    20/27

    20

    Speedup

    S = Speed(new) / Speed(old)

    S = Work/time(new) / Work/time(old)

    S = time(old) / time(new)

    S = time(before improvement) /

    time(after improvement)

  • 7/28/2019 parallel computer Models

    21/27

    21

    Speedup

    Time (one CPU): T(1)

    Time (n CPUs): T(n)

    Speedup: S

    S = T(1)/T(n)

  • 7/28/2019 parallel computer Models

    22/27

    22

    Amdahls Law

    The performance improvement to be gained from using

    some faster mode of execution is limited by the fractionof the time the faster mode can be used

  • 7/28/2019 parallel computer Models

    23/27

    23

    20 hours

    200 miles

    A B

    Walk 4 miles /hour

    Bike 10 miles / hourCar-1 50 miles / hour

    Car-2 120 miles / hour

    Car-3 600 miles /hour

    must walk

    Example

  • 7/28/2019 parallel computer Models

    24/27

    24

    20 hours

    200 miles

    A B

    Walk 4 miles /hour 50 + 20 = 70 hours S = 1

    Bike 10 miles / hour

    20 + 20 = 40 hours S = 1.8Car-1 50 miles / hour 4 + 20 = 24 hours S = 2.9

    Car-2 120 miles / hour 1.67 + 20 = 21.67 hours S = 3.2

    Car-3 600 miles /hour 0.33 + 20 = 20.33 hours S = 3.4

    must walk

    Example

  • 7/28/2019 parallel computer Models

    25/27

    25

    Amdahls Law (1967)

    : The fraction of the program that is naturally serial

    (1- ): The fraction of the program that is naturallyparallel

  • 7/28/2019 parallel computer Models

    26/27

    26

    S = T(1)/T(N)

    T(N) = T(1) + T(1)(1- )N

    S =

    1

    + (1- )

    N

    =

    N

    N + (1- )

  • 7/28/2019 parallel computer Models

    27/27

    27

    Amdahls Law