tÉcnicas de optimizaciÓn y paralelizaciÓn

TÉCNICAS DE OPTIMIZACIÓN Y PARALELIZACIÓN

MANUEL ARENAZ

Associate Professor

Master en Computación de Altas Prestaciones Course 2012-2013

[email protected]

Agenda !  Do I really need parallelism? !  The HPC software marketplace can be organized from the

productivity viewpoint? !  What do I need to learn to write HPC Apps? !  How can I be more productive? !  Are there “Design Patterns” to help writing HPC Apps? !  Can you give an example of a parallel design pattern? !  Are there Hw-independent key concepts? !  Are there frequently used design patterns for HPC Apps? !  Can you propose a development methodology for GPU

programming? !  Can you meassure productivity? !  Conclusions

Software Marketplace

MANUAL

AUTOMATIC

SEMIAUTOMATIC

CUDA OpenCL ArrayFire MPI Posix Threads

HMPP

PGIACC

+

-

ICC XL Visual Studio

Pro Fortran

NAG Fortran

Math library

Software Technologies

MANUAL

AUTOMATIC

SEMIAUTOMATIC

+

- Threads

Intrinsics SSE

Summary from Session #2

VECTOR

Summary from Session #2

!  Key concepts for parallel design patterns: "  Parallel region "  Work-sharing "  Privatization "  Synchronization "  Reductions

!  Support in widely-used software tools "  MPI "  OpenMP "  OpenACC "  SSE

Are there frequently used design patterns for HPC Apps?

Parallel Design Patterns

!  Núcleo computacional REDUCCIÓN

!  Núcleo computacional ASIGNACIÓN

!  Núcleo computacional RECURRENCIA

!  Núcleos computacionales REINICIALIZADOS

!  Descripción de aplicaciones en términos de núcleos

computacionales

©Manuel Arenaz, 2011 Técnicas de Optimización y Paralelización (TOP) Pag. 83







computacionales


Reduction

!  Update the value of a set of memory locations as a function of the previous value stored in each memory location:

A = A ⊕ B The operator ⊕ is associative and commutative (e.g., sum, product, max/min).

!  Types of reductions: "  Scalar reduction: s=s+1 "  Regular reduction: A(h)=A(h)+1 "  Irregular reduction: A(f(h))=A(f(h))+1

Scalar Reduction




Regular Reduction




Irregular Reduction










computacionales


Assignment

!  Modify the value of a set of memory locations overwriting the previous values:

A = B The previous value is lost, it is not used to update the value saved in the memory location.

!  Types of assignments: "  Scalar assignment: s=1 "  Regular assignment: A(h)=1 "  Irregular assignment: A(f(h))=1

Scalar Assignment




Regular Assignment




Irregular Assignment










computacionales


Recurrences

!  Update the value of a set of memory locations as a function of the previous values stored in several the memory locations:

A = A ⊕ … ⊕ A ⊕ B

!  Types of recurrences: "  Regular recurrences: A(h)=A(h-1)+1 "  Irregular recurrences: A(f(h))=A(g(h))+1







computacionales


Reinitialized Patterns

A = B A = A ⊕ C

!  Combinación de dos núcleos computacionales "  ASIGNACIÓN,

para inicializar un conjunto de datos

"  REDUCCIÓN o RECURRENCIA, para la manipulación de dicho conjunto de datos

subroutine amux (n, x, y, a,ja,ia)real*8 x(*), y(*), a(*)integer n, ja(*), ia(*)real*8 tinteger i, kdo i = 1,n

t = 0.0d0do k=ia(i), ia(i+1)-1 t = t + a(k)*x(ja(k))enddoy(i) = t

enddoreturnend

Código: Producto matriz dispersa - vector Formato: Matriz CRS Fuente: SparskitII, módulo MATVEC,

rutina amux.f






!  Whole-Program analysis based on parallel design

patterns


Whole-Program Analysis

do h = 1,Adim hist(h) = 0enddodo h = 1,fDim hist(f(h)) = hist(f(h)) + 1enddodo h = 2,Adim hist(h) = hist(h) + hist(h-1)enddo

Código: Cálculo del histograma acumulativo de un patrón de acceso irregular

Fuente: Inspector con balanceo de carga

!  Hemos visto un ejemplo sencillo en el que podíamos describir el comportamiento del código como una secuencia de núcleos computacionales

ASIGNACIÓN REGULAR Paralelizable

REDUCCIÓN IRREGULAR Paralelizable

RECURRENCIA REGULAR No paralelizable!




enddoreturnend

!  Podemos incluso describir el comportamiento de aplicaciones en cuya codificación aparecen los núcleos “entrelazados”

Código: Producto matriz dispersa - vector Formato: Matriz CRS Fuente: SparskitII, módulo MATVEC, rutina amux.f


REDUCCIÓN ESCALAR REINICIALIZADA Privatizable

ASIGNACIÓN REGULAR Paralelizable




enddoreturnend

!  Análisis contextual: Un mismo fragmento de sentencias puede ser descrito mediante núcleos computacionales diferentes según el ámbito en el que se analice

Código: Producto matriz dispersa - vector Formato: Matriz CRS Fuente: SparskitII, módulo MATVEC, rutina amux.f


REDUCCIÓN ESCALAR Paralelizable


MATMUL


SOBEL


Are there frequently used design patterns for HPC Apps?

TÉCNICAS DE OPTIMIZACIÓN Y PARALELIZACIÓN

MANUEL ARENAZ

Associate Professor

Master en Computación de Altas Prestaciones Course 2012-2013

[email protected]

tÉcnicas de optimizaciÓn y paralelizaciÓn

Documents