masterpraktikum - high performance computing - openmp · #pragma omp for ..., and many more...

Post on 16-Mar-2020

14 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Masterpraktikum - High PerformanceComputing

OpenMP

Michael BaderAlexander Heinecke

Alexander Breuer

Technische Universitat Munchen, Germany

Technische Universitat Munchen

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

2

Technische Universitat Munchen

#include <omp.h>...#pragma omp parallel forfor(i = 0; i < n; i++){

for(j = 0; j < n; j++){

do some work(i, j);}

}

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

3

Technische Universitat Munchen

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

4

Technische Universitat Munchen

Overview• Compiler-directives#pragma omp ... [clause[[,] clause]...]

• Work-sharing#pragma omp for ..., and many more

• Synchronization#pragma omp barrier, and many more

• Data Scope Clausesshared, private, firstprivate, reduction, u.a.

• Bibliotheksroutinenomp get num treads(),omp get tread num(), u.a.

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

5

Technische Universitat Munchen

compiling and excuting• compile with additional Compiler-Flag -openmp (-fopenmp in

case of gcc):icc -openmp -o my alg my alg.c

• ausfuhren:export OMP NUM THREADS=number of threads(default = 1)./my alg

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

6

Technische Universitat Munchen

Parallel Region Construct, OpenMP API#pragma omp parallel [clause[[,] clause]...]structured block

• code within parallel region are executed by all threads

Example#include <omp.h>...#pragma omp parallel private(size, rank){

size = omp get num threads();rank = omp get thread num();printf("Hello World! (Thread %d of %d)", rank,

size);}

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

7

Technische Universitat Munchen

work sharing constructs• #pragma omp for [clause[[,] clause]...]for-loop

• within a parallel region and directly in front of afor-loop

• iterations are scheduled across different threads• schedule clause determines how to map iterations to

threadsschedule(static[, chunksize]) default-valuechunksize is #iterations divided by #hreadsschedule(dynamic[, chunksize]) default-valueschunksize is 1schedule(runtime) Scheduling is given by environmentvariable OMP SCEDULE

• implicit synchronization in the end of for-loop (can bedisabled with nowait clause)

• shortcut possible: #pragma omp parallel for

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

8

Technische Universitat Munchen

work sharing constructs II#pragma omp task [clause[[,] clause]...]structured blockwhere clause can be...

• final(expression): if expression is true, task isexecuted sequentially; no recursive task generation

• untied: the execution of a task is not tied to one single thread

• shared | firstprivate | private

• some more...

#pragma omp taskwait

• waits for children completition

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

9

Technische Universitat Munchen

work sharing constructs III• #pragma omp single [clause[[,] clause]...]structured block

• use only one (arbitrary) thread• implicit synchronization

• #pragma omp masterstructured block

• use only master thread for structures block• NO synchronization in the end

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

10

Technische Universitat Munchen

synchronization constructs• #pragma omp barrier

• blocks execution until all threads have reached the barrier

• #pragma omp criticalstructured block

• only one thread a time can execute the structured blockedencapsulated by critical

• ATTENTION: use carefully! Will definitely kill performance.

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

11

Technische Universitat Munchen

reduction clause• reduction (operator: list)

• executes a reduction of variables in list using operator• available operators: +, ∗, −, &&, ||

Beispiel#pragma omp parallel for private(r), reduction(+:sum)for(i = 0; i < n; i++){

r = compute r(i);sum = sum + r;

}

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

12

Technische Universitat Munchen

OpenMP data scope clauses• private(list): DECLARES variables in list as private for each

thread (no copy!)

• shared(list): variables in list are used by all threads (raceconditions are possible!), write accesses have to be handled byprogrammer

• firstprivate(list): private varibles AND init them with latestvalid value before parallel region

• lastprivate(list): after parallel execution, in serial part reusedlatest modification.

• default data scope clause is shared, but exceptions exists⇒ beprecise!

• local variables are always private• loop-variables of for-loops are private (C++ only)

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

13

Technische Universitat Munchen

OpenMP data scope clauses - Exercise

Beispielk = compute k();#pragma omp parallel for private(?), shared(?),firstprivate(?), lastprivate(?), reduction(?)for(i = 0; i < n; i++){

k = compute my k(i, k);r = compute r(i, k, h);sum = sum + r;

}

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

14

Technische Universitat Munchen

Questions?

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

15

top related