lecture 2 a general discussion on parallelismcavazos/cisc879-spring2012/lecture-02.pdfcisc 879 :...

29
CISC 879 : Advanced Parallel Programming John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 Lecture 2 A General Discussion on Parallelism

Upload: others

Post on 12-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

John Cavazos!Dept of Computer & Information Sciences!

University of Delaware!

www.cis.udel.edu/~cavazos/cisc879!

Lecture 2!A General Discussion on!

Parallelism!

Page 2: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Lecture 2: Overview

•  Flynn’s Taxonomy of Architectures •  Types of Parallelism •  Parallel Programming Models •  Commercial Multicore Architectures

Page 3: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Flynn’s Taxonomy of Arch.

•  SISD - Single Instruction/Single Data •  SIMD - Single Instruction/Multiple Data •  MISD - Multiple Instruction/Single Data •  MIMD - Multiple Instruction/Multiple Data

Page 4: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Single Instruction/Single Data

The was the typical machine before multicores. Slide Source: Wikipedia, Flynn’s Taxonomy

Page 5: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Single Instruction/Multiple Data

Processors that execute same instruction on multiple pieces of data (e.g., vector instructions).

Slide Source: Wikipedia, Flynn’s Taxonomy

Page 6: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Single Instruction/Multiple Data

•  Each core executes same instruction simultaneously

•  Vector-style of programming •  Natural for graphics and scientific computing •  Good choice for massively multicore

Page 7: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

SISD versus SIMD

SIMD very often requires compiler intervention. Slide Source: ars technica, Peakstream article

Page 8: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Multiple Instruction/Single Data

Only Theoretical Machine. None ever implemented.

Slide Source: Wikipedia, Flynn’s Taxonomy

Page 9: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Multiple Instruction/Multiple Data

Many mainstream multicores fall into this category.

Slide Source: Wikipedia, Flynn’s Taxonomy

Page 10: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Multiple Instruction/Multiple Data

•  Each core •  simultaneously executing different instructions

on different data •  private upper levels of memory hierarchy •  may share lower level of cache and RAM •  SIMD-extensions

•  Programmed using a variety of methods •  OpenMP, MPI, pthreads, TBB, AMP, …

Page 11: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Lecture 2: Overview

•  Flynn’s Taxonomy of Architecture •  Types of Parallelism •  Parallel Programming Models •  Commercial Multicore Architectures

Page 12: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Types of Parallelism Instructions:

Slide Source: S. Amarasinghe, MIT 6189 IAP 2007

Page 13: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Pipelining

Corresponds to SISD architecture.

Slide Source: S. Amarasinghe, MIT 6189 IAP 2007

Page 14: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Instruction-Level Parallelism

Dual instruction issue superscalar model. Again, corresponds to SISD architecture.

Slide Source: S. Amarasinghe, MIT 6189 IAP 2007

Page 15: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Data-Level Parallelism

Data Stream or

Array Elements

What architecture model from Flynn’s Taxonomy does this correspond to?

Slide Source: Arch. of a Real-time Ray-Tracer, Intel

Page 16: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Data-Level Parallelism

Data Stream or

Array Elements

Corresponds to SIMD architecture.

Slide Source: Arch. of a Real-time Ray-Tracer, Intel

Page 17: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Data-Level Parallelism

One operation (e.g., +) produces multiple results. X, Y, and result are arrays.

Slide Source: Klimovitski & Macri, Intel

Page 18: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Thread-Level Parallelism

P1 P2 P3 P4 P5 P6

Program partitioned into four threads.

Multicore with 6 cores.

Four threads each executed on separate cores.

What architecture from Flynn’s Taxonomy does this correspond to?

Slide Source: SciDAC Review, Threadstorm pic.

Page 19: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Thread-Level Parallelism

P1 P2 P3 P4 P5 P6

Program partitioned into four threads.

Multicore with 6 cores.

Four threads each executed on separate cores.

Corresponds to MIMD architecture.

Slide Source: SciDAC Review, Threadstorm pic.

Page 20: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Lecture 2: Overview

•  Flynn’s Taxonomy of Architecture •  Types of Parallelism •  Parallel Programming Models •  Commercial Multicore Architectures

Page 21: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Multicore Programming Models •  Message Passing Interface (MPI) •  OpenMP •  Threads

•  Pthreads •  Java Threads

•  Parallel Libraries •  Intel’s Thread Building Blocks (TBB) •  Microsoft’s Task Parallel Library •  SWARM (GTech) •  Charm++ (UIUC) •  STAPL (Texas A&M)

Page 22: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

GPU Programming Models •  CUDA (Nvidia)

•  C/C++ extensions

•  OpenCL (many vendors, including Nvidia and AMD) •  Pragma-based language extensions

•  HMPP •  OpenACC

•  Cray and PGI directives

Page 23: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Lecture 2: Overview

•  Flynn’s Taxonomy of Architecture •  Types of Parallelism •  Parallel Programming Models •  Commercial Multicore Architectures

Page 24: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Generalized Multicore

Slide Source: Michael McCool, Rapid Mind, SuperComputing, 2007

Page 25: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Cell B.E. Architecture

Slide Source: Michael McCool, Rapid Mind, SuperComputing, 2007

Page 26: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

NVIDIA GPU Architecture G80

Slide Source: Michael McCool, Rapid Mind, SuperComputing, 2007

Page 27: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Commercial Multcores

Slide Source: Dave Patterson, Manycore and Multicore Computing Workshop, 2007

Page 28: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Intel Teraflops Video

http://www.youtube.com/watch?v=TAKG0UvtzpE

Page 29: Lecture 2 A General Discussion on Parallelismcavazos/cisc879-spring2012/Lecture-02.pdfCISC 879 : Advanced Parallel Programming Multiple Instruction/Multiple Data • Each core •

CISC 879 : Advanced Parallel Programming

Next Time

•  More Parallel Programming Models •  GPUs

•  Architecture and Programming Models

•  Read Chapters 1 of OpenCL book