hamid sarbazi-azad - sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/slides... · a...

56
Computational Mathematics Department of Computer Engineering Sharif University of Technology e-mail: [email protected] Hamid Sarbazi-Azad

Upload: vuongphuc

Post on 08-May-2018

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

Computational

Mathematics

Department of Computer Engineering Sharif University of Technology e-mail: [email protected]

Hamid Sarbazi-Azad

Page 2: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

OpenMP

Department of Computer Engineering Sharif University of Technology e-mail: [email protected]

Work-sharing Instructor

PanteA Zardoshti

Page 3: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

Computational Mathematics, OpenMP , Sharif University Fall 2015 3

A worksharing construct distributes the execution of the associated region among the members of the team that encounters it.

Work-sharing

#pragma omp parallel for { for (i=0;i<100;i++) A(i) = A(i) + B }

Page 4: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

Computational Mathematics, OpenMP , Sharif University Fall 2015 4

A worksharing construct distributes the execution of the associated region among the members of the team that encounters it.

A worksharing region has no barrier on entry; however, an implied barrier exists at the end of the worksharing region.

Work-sharing

#pragma omp parallel for { for (i=0;i<100;i++) A(i) = A(i) + B }

barrier

Page 5: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

Computational Mathematics, OpenMP , Sharif University Fall 2015 5

The OpenMP API defines the following worksharing constructs, and these are described in the sections that follow:

• loop

• sections

• single

Constructs

Page 6: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

LOOP CONSTRUCT

Computational Mathematics, OpenMP , Sharif University Fall 2015 6

Page 7: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

7

The loop construct specifies that the iterations of one or more associated loops will be executed in parallel by threads in the team in the context of their implicit tasks.

The iterations are distributed across threads that already exist in the team executing the parallel region to which the loop region binds.

Loop Construct

#pragma omp for [clause[[,] clause] ... ]

for-loops

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 8: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

8

where clause is one of the following:

• private(list)

• firstprivate(list)

• lastprivate(list)

• schedule(kind[, chunk_size])

• collapse(n)

• ordered

• Nowait

• reduction(reduction-identifier: list)

Clauses

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 9: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

9

How OMP schedules iterations?

Schedule Clause

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 10: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

10

How OMP schedules iterations?

Although the OpenMP standard does not specify how a loop should be partitioned most compilers split the loop in N/p (N #iterations, p #threads) chunks by default.

Schedule Clause

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 11: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

11

How OMP schedules iterations?

Although the OpenMP standard does not specify how a loop should be partitioned most compilers split the loop in N/p (N #iterations, p #threads) chunks by default.

This is called a static schedule (with chunk size N/p)

Schedule Clause

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 12: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

12

How OMP schedules iterations?

Although the OpenMP standard does not specify how a loop should be partitioned most compilers split the loop in N/p (N #iterations, p #threads) chunks by default.

This is called a static schedule (with chunk size N/p) • For example, suppose we have a loop with 1000 iterations and 4 omp

threads.The loop is partitioned as follows:

Schedule Clause

1 250 500 750 1000

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 13: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

13

How OMP schedules iterations?

Although the OpenMP standard does not specify how a loop should be partitioned most compilers split the loop in N/p (N #iterations, p #threads) chunks by default.

This is called a static schedule (with chunk size N/p) • For example, suppose we have a loop with 1000 iterations and 4 omp

threads.The loop is partitioned as follows:

Schedule Clause

1 250 500 750 1000

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 14: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

14

Static

• Blocks of iterations of size “chunk” to threads

• Round robin distribution

schedule(static [,chunk])

Schedule Clause

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 15: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

15

Static

• Blocks of iterations of size “chunk” to threads

• Round robin distribution

schedule(static [,chunk])

Dynamic

• Threads grab “chunk” iterations

• When done with iterations, thread requests next set

schedule(dynamic[,chunk])

Schedule Clause

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 16: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

16

Guided

• Dynamic schedule starting with large block

• Size of the blocks shrink; no smaller than “chunk”

schedule(guided[,chunk])

Runtime • Indicates that the schedule type and chunk are specified by

environment variable OMP_SCHEDULE

• Example of run-time specified scheduling

OMP_SCHEDULE “dynamic,2”

Schedule Clause(cont’d)

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 17: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

17

The Experiment

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 18: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

18

Allows parallelization of perfectly nested loops without using nested parallelism

collapse clause on for/do loop indicates how many loops should be collapsed

Compiler forms a single loop and then parallelizes this

Collapse Clause

#pragma omp for collapse (2) for (k=1; k<=100; k++) for (j=1; j<=200; j++)

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 19: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

19

The ordered region executes in the sequential order

since do_lots_of_work takes a lot of time, most parallel benefit will be realized

ordered is helpful for debugging

Ordered Clause

#pragma omp parallel for for(i = 0; i < nproc; i++){ do_lots_of_work(result[i]); #pragma omp ordered fprintf(fid,”%d %f\n,”i,result[i]”); #pragma omp end ordered }

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 20: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

20

To minimize synchronization, some OpenMP pragmas support the optional nowait clause

If present, threads do not synchronize/wait at the end of that particular construct

Nowait Clause

#pragma omp for nowait for (k=1; k<=100; k++) …

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 21: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

21

#pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale)

{

f = 1.0;

#pragma omp for nowait

for (i=0; i<n; i++)

z[i] = x[i] + y[i];

#pragma omp for nowait

for (i=0; i<n; i++)

a[i] = b[i] + c[i];

....

#pragma omp barrier

scale = sum(a,0,n) + sum(z,0,n) + f

} /*-- End of parallel region --*/

Example

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 22: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

22

#pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale)

{

f = 1.0;

#pragma omp for nowait

for (i=0; i<n; i++)

z[i] = x[i] + y[i];

#pragma omp for nowait

for (i=0; i<n; i++)

a[i] = b[i] + c[i];

....

#pragma omp barrier

scale = sum(a,0,n) + sum(z,0,n) + f

} /*-- End of parallel region --*/

Example parallel region

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 23: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

23

#pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale)

{

f = 1.0;

#pragma omp for nowait

for (i=0; i<n; i++)

z[i] = x[i] + y[i];

#pragma omp for nowait

for (i=0; i<n; i++)

a[i] = b[i] + c[i];

....

#pragma omp barrier

scale = sum(a,0,n) + sum(z,0,n) + f

} /*-- End of parallel region --*/

Example parallel region

Statement is executed by all threads

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 24: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

24

#pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale)

{

f = 1.0;

#pragma omp for nowait

for (i=0; i<n; i++)

z[i] = x[i] + y[i];

#pragma omp for nowait

for (i=0; i<n; i++)

a[i] = b[i] + c[i];

....

#pragma omp barrier

scale = sum(a,0,n) + sum(z,0,n) + f

} /*-- End of parallel region --*/

Example parallel region

Statement is executed by all threads

parallel loop

(work is distributed)

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 25: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

25

#pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale)

{

f = 1.0;

#pragma omp for nowait

for (i=0; i<n; i++)

z[i] = x[i] + y[i];

#pragma omp for nowait

for (i=0; i<n; i++)

a[i] = b[i] + c[i];

....

#pragma omp barrier

scale = sum(a,0,n) + sum(z,0,n) + f

} /*-- End of parallel region --*/

Example parallel region

Statement is executed by all threads

parallel loop

(work is distributed)

parallel loop

(work is distributed)

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 26: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

26

#pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale)

{

f = 1.0;

#pragma omp for nowait

for (i=0; i<n; i++)

z[i] = x[i] + y[i];

#pragma omp for nowait

for (i=0; i<n; i++)

a[i] = b[i] + c[i];

....

#pragma omp barrier

scale = sum(a,0,n) + sum(z,0,n) + f

} /*-- End of parallel region --*/

Example parallel region

Statement is executed by all threads

parallel loop

(work is distributed)

parallel loop

(work is distributed)

synchronization

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 27: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

27

#pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale)

{

f = 1.0;

#pragma omp for nowait

for (i=0; i<n; i++)

z[i] = x[i] + y[i];

#pragma omp for nowait

for (i=0; i<n; i++)

a[i] = b[i] + c[i];

....

#pragma omp barrier

scale = sum(a,0,n) + sum(z,0,n) + f

} /*-- End of parallel region --*/

Example parallel region

Statement is executed by all threads

parallel loop

(work is distributed)

parallel loop

(work is distributed)

Statement is executed by all threads

synchronization

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 28: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

28

Barrier

Tread 1 Tread 2 Tread 3

Tread 1 Tread 2 Tread 3

barrier

barrier

?

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 29: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

29

Barrier

Tread 1 Tread 2 Tread 3

Tread 1 Tread 2 Tread 3

barrier

barrier

Use OMP_WAIT_POLICY

to control behaviour of

idle threads ?

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 30: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

30

Suppose we run each of these two loops in parallel over i:

This may give us a wrong answer, Why ?

Example

for (i=0; i < N; i++) a[i] = b[i] + c[i]; for (i=0; i < N; i++) d[i] = a[i] + b[i];

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 31: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

31

Suppose we run each of these two loops in parallel over i:

This may give us a wrong answer, Why ?

Example

for (i=0; i < N; i++) a[i] = b[i] + c[i]; for (i=0; i < N; i++) d[i] = a[i] + b[i];

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 32: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

32

We need to have updated all of a[ ] first, before using a[ ]

All threads wait at the barrier point and only continue when all threads have reached the barrier point

Example(cont’d)

for (i=0; i < N; i++) a[i] = b[i] + c[i]; for (i=0; i < N; i++) d[i] = a[i] + b[i];

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 33: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

33

We need to have updated all of a[ ] first, before using a[ ]

All threads wait at the barrier point and only continue when all threads have reached the barrier point

Example(cont’d)

for (i=0; i < N; i++) a[i] = b[i] + c[i]; for (i=0; i < N; i++) d[i] = a[i] + b[i];

wait ! barrier

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 34: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

34

Barrier

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 35: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

SECTIONS CONSTRUCT

35 Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 36: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

36

Independent sections of code can execute concurrently

Sections Construct

#pragma omp parallel sections [clause[[,] clause] ...] { #pragma omp section phase1(); #pragma omp section phase2(); #pragma omp section phase3(); }

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 37: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

37

where clause is one of the following:

• private(list)

• firstprivate(list)

• lastprivate(list)

• Nowait

• reduction(reduction-identifier: list)

Clauses

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 38: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

38

#pragma omp parallel default(none) shared(n,a,b,c,d) private(i)

{

#pragma omp sections nowait

{

#pragma omp section

for (i=0; i<n-1; i++)

b[i] = (a[i] + a[i+1])/2;

#pragma omp section

for (i=0; i<n; i++)

d[i] = 1.0/c[i];

} /*-- End of sections --*/

} /*-- End of parallel region --*/

Example

Section #1 Section #2

Parallel Region

Time

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 39: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

39

Example

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 40: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

40

Example

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 41: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

SINGLE CONSTRUCT

41 Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 42: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

42

Denotes block of code to be executed by only one thread

Thread chosen is implementation dependent

Implicit barrier at end

Single Construct

#pragma omp parallel { DoManyThings(); #pragma omp single { ExchangeBoundaries(); } DoManyMoreThings(); }

Threads wait here for single

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 43: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

43

Single Construct

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 44: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

CRITICAL SECTION

44 Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 45: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

45

float dot_prod(float* a, float* b, int N)

{

float sum = 0.0;

#pragma omp parallel for shared(sum)

for(int i=0; i<N; i++) {

sum += a[i] * b[i];

}

return sum;

}

Critical Section

What is Wrong?

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 46: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

46

Defines a critical region on a structured block

Critical Construct

float dot_prod(float* a, float* b, int N) { float sum = 0.0; #pragma omp parallel for shared(sum) for(int i=0; i<N; i++) { #pragma omp critical sum += a[i] * b[i]; } return sum; }

#pragma omp critical [(lock_name)]

Naming the critical constructs is

optional,but may increase

performance.

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 47: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

47

The variables in “list” must be shared in the enclosing parallel Region

Inside parallel or work-sharing construct: • A PRIVATE copy of each list variable is created and initialized

depending on the “op” • These copies are updated locally by threads

• At end of construct, local copies are combined through “op” into a single value and combined with the value in the original SHARED variable

Reduction Clause

reduction (op : list)

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 48: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

48

Local copy of sum for each thread

All local copies of sum added together and stored in “global” variable

Reduction Clause

float dot_prod(float* a, float* b, int N) { float sum = 0.0; #pragma omp parallel for reduction (+:sum) for(int i=0; i<N; i++) { sum += a[i] * b[i]; } return sum; }

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 49: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

c) Hamid Sarbazi-Azad Parallel Programming: OpenMP 49

Reduction Clause (cont.)

Operators

• + Sum

• * Product

• & Bitwise and

• | Bitwise or

• ^ Bitwise exclusive or

• && Logical and

• || Logical or

Page 50: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

50

Special case of a critical section

Applies only to simple update of memory location

Atomic Construct

#pragma omp parallel for shared(x, y, index, n) for (i = 0; i < n; i++) { #pragma omp atomic x[index[i]] += work1(i); y[i] += work2(i); }

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 51: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

51

void omp_init_lock(omp_lock_t * lock_p);

void omp_set_lock(omp_lock_t * lock_p);

void omp_unset_lock(omp_lock_t * lock_p);

void omp_destroy_lock(omp_lock_t * lock_p);

Lock Construct

Protect resources with locks.

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 52: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

52

omp_lock_t lck;

omp_init_lock(&lck);

#pragma omp parallel for

for(i=0;i<=N;i++){

omp_set_lock(&lck);

result+=w[i]*y[i];

omp_unset_lock(&lck);

} omp_destroy_lock(&lck);

Lock Construct

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 53: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

53

omp_lock_t lck;

omp_init_lock(&lck);

#pragma omp parallel for

for(i=0;i<=N;i++){

omp_set_lock(&lck);

result+=w[i]*y[i];

omp_unset_lock(&lck);

} omp_destroy_lock(&lck);

Lock Construct

Wait here for your turn

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 54: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

54

omp_lock_t lck;

omp_init_lock(&lck);

#pragma omp parallel for

for(i=0;i<=N;i++){

omp_set_lock(&lck);

result+=w[i]*y[i];

omp_unset_lock(&lck);

} omp_destroy_lock(&lck);

Lock Construct

Wait here for your turn

Release the lock so the next

thread gets a turn

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 55: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs

55

omp_lock_t lck;

omp_init_lock(&lck);

#pragma omp parallel for

for(i=0;i<=N;i++){

omp_set_lock(&lck);

result+=w[i]*y[i];

omp_unset_lock(&lck);

} omp_destroy_lock(&lck);

Lock Construct

Wait here for your turn

Release the lock so the next

thread gets a turn

Free--up storage when done

Computational Mathematics, OpenMP , Sharif University Fall 2015

Page 56: Hamid Sarbazi-Azad - Sharifce.sharif.edu/courses/93-94/2/ce215-1/resources/root/Slides... · A worksharing region has no barrier on entry; however, an ... • single Constructs