introduction to openmp part ii white rose grid computing training series deniz savas, alan real,...

Introduction to OpenMPPart II

White Rose Grid Computing Training SeriesDeniz Savas, Alan Real, Mike Griffiths

RTP Module February 2012

Synchronisation Pitfalls when using shared variables (Race Conditions)

• A variable that is used (read from) but never updated (written to) can safely be declared as a shared variable in a parallel region.

• Problems arise when the above rule is violated by attempting to change the value of any shared variable within the parallel region. Such problems are known as data-race problems and should be avoided at the programming level. However, for situations where avoidance is not possible or efficient, there are a variety of OMP directives that can be used for resolving these problems. These are BARRIER,ATOMIC,CRITICAL and FLUSH which we will discussed later.

Synchronisation example

a=a+1 on 2 threads where a is a shared variable

load aadd a 1store a

Programload aadd a 1store a

Program

Private data

10 1111 12

Case 1 (thread 2 behind thread 1): a=12

Thread 1 Thread 2

Shared data101112

Synchronisation example

a=a+1 on 2 threads where a is a shared variable

load aadd a 1store a

Programload aadd a 1store a

Program

Private data

1011 10

Case 1 (thread 2 behind thread 1): a=12

Case 2 (thread 2 at similar time to thread 1): a=11

Thread 1 Thread 2

Shared data1011

11

Synchronization related directives

We have seen the potential problems arising from the interaction of multiple threads, particularly the race conditions when multiple threads attempt to write to the same shared variable simultaneously. Such events may render our results useless, being determined by the toss of a coin, according to which thread runs ahead of which one.

The following set of OMP directives, namely; CRITICAL, BARRIER ATOMIC and FLUSH directives help us to avoid these synchronization related problems.

OMP Barrier

• Syntax C: #pragma omp barrier Fortran: !#omp barrier

• This directive defines a marker where all threads must reach before the execution of the program continues. It may be a useful tool in circumstances where you need to ensure that the work relating to one set of tasks are completed before embarking on a new set of tasks.

• Beware, overuse of this feature may reduce efficiency.• It may also give rise to DEADLOCK situations• Never-the-less it is very useful to ensure correct working of

complex programs• Most of the work sharing directives have an implied barrier at

the end of their block ( unless NOWAIT is used). I.e. OMP END DO, OMP END SECTIONS, OMP END WORKSHARE. Note that they do not have an implied barrier at the beginning, only at the end unless a no wait is specified : I.e. !$OMP END WORKSHARE NOWAIT

OMP BARRIERTo avoid deadlocks, NEVER use $OMP BARRIER inside any

of these blocks !

!$OMP MASTER . . . . !$OMP END MASTER !$OMP SECTIONS . . . . !$OMP END SECTIONS !$OMP CRITICAL . . . . !$OMP END CRITICAL !$OMP SINGLE . . . . !$OMP END SINGLE

NOWAIT clause

• We have seen during the earlier discussion of the BARRIER statement that the directives END DO/FOR, END SECTIONS, END SINGLE and END WORKSHARE all imply a barrier where executing threads must wait until everyone of them finished their work and arrived there.

• The NOWAIT clause of the above mentioned statements remove this restriction to allow the earlier finishing threads to proceed straight onto the instructions following the work sharing construct without having to waiting for the other threads to catch up.

• This will reduce the amount of idle periods and increase efficiency but at the risk of producing wrong results! SO BE VERY CAREFUL!

• Syntax:– Fortran: !$OMP DO

do loop !$OMP END DO NOWAIT

– C/C++: #pragma omp for nowait for loop

Similar for END SECTIONS , END SINGLE and END WORKSHARE

NOWAIT example• Two loops with no dependencies will present an ideal

opportunity for the NOWAIT clause.

!$OMP PARALLEL!$OMP DO

do j=1,n a(j) = c * b(j) end do

!$OMP ENDDO NOWAIT!$OMP DO

do i=1,m x(i) = sqrt(y(i)) * 2.0 end do

!$OMP END PARALLEL

NOWAIT warning

• Use with EXTREME CAUTION• Too easy to remove a barrier which is necessary.• Results in non-deterministic behaviour:

– Sometimes the right result– Sometimes wrong results– Behaviour changes under debugger

• Possibly a good coding style to use NOWAIT everywhere and make all barriers explicit– Not done in practice.

NOWAIT warning example!$OMP DO do j=1,n a(j)=b(j) + c(j) end do!$OMP END DO!$OMP DO do j=1,n d(j)=e(j) * f end do$OMP END DO!$OMP DO do j=1,n z(j) = (a(j) + a(j+1)) * 0.5 end do• a(j+1) could be updated by a different thread to a(j)

Can remove the first barrier but not the second as there is a dependency on a( )

OMP CRITICAL

( Mutual Exclusion )

A thread waits at the start of a critical section until no other thread is executing a section with the same critical name.

This construct can be utilised to mark sections of the code that may, for example change global flags etc., once a particular task is performed so as not to repeat the same work again. It is also useful for sectioning-off code such as updating of heaps and stacks, where simultaneous updating by competing threads may prove disastrous!

The OMP ATOMIC directive becomes a better choice if the synchronization worries are related to a specific memory location.

OMP CRITICAL EXAMPLE

!$OMP PARALLEL SHARED( MYDATA )!$OMP CRITICAL updatepart! Perform operations on the global/shared array WORK! Which redefines WORK and then sets new flags to ! indicate what the next call to partition will see in MYDATA. CALL PARTITION ( I , MYDATA)!$OMP END CRITICAL updatepart! Now perform the work , that can be done in isolation! Without affecting the other threads

CALL SOLVE(MYDATA) $OMP END PARALLEL

OMP Atomic• Unlike most of the other OMP directives, this is a directive that

applies to a single statement immediately following itself ‘rather than a block of statements’.

• It ensures that a specific shared memory location is updated atomically to avoid it been exposed to the possibility of simultaneous writes that may give rise to race conditions.

• May be more efficient than using CRITICAL directives• e.g. if different array elements can be protected separately.

• By using the atomic directive we can be confident that no race situation will arise while evaluating an expression and updating a variable it is assigned to.

• Note that ATOMIC directive does not impose any conditions on the order in which each thread will execute the statement, it merely ensures that no two threads will execute it simultaneously. See OMP ORDERED later.

ATOMIC directive SyntaxSyntax:

– Fortran !$OMP ATOMIC statement

where statement must be one of ;x=x op(expr), x= (expr)op x, x=intr(x,expr) or x=intr(expr,x)

x is a scalar shared variable and op is one of +,*,-,/,.and.,.or.,.eqv.,.neqv. intr is one of MAX,MIN,IAND,IOR or IEOR intrinsic functions.

– C #pragma omp atomic statement

where statement must be one of ; x binop= expr, x++, ++x, x–- or –-x binop is one of +,*,-,/,&,^,<< or >> and expr is an expression of scalar type that does not reference the object

designated by x.

ATOMIC example

!$OMP PARALLEL DO PRIVATE(xlocal,ylocal) DO i=1,n call work(xlocal,ylocal)!$OMP ATOMIC x(index(i))=x(index(i))+xlocal y(i)=y(i)+ylocal END DO• Prevents simultaneous updates of an element of x by multiple

threads.• ATOMIC directives allows different elements of x to be

updated simultaneously.– CRITICAL region would “serialise” the update.

• Note: update on y is not atomic as ATOMIC only applies to the statement that immediately follows the directive.

Lock Routines

• Occasionally need more flexibility than offered by CRITICAL and ATOMIC directives.

(Although not as easy to use)

– A lock is a special variable that may be set by a thread.– No other thread may unset the lock until the thread which

set the lock has unset it.– Setting a lock may be blocking ‘set_lock’ or non-blocking

‘test_lock’.– A lock must be initialised before it is used and may be

destroyed when no longer required.– Lock variables should not be used for any other purpose.

Syntax• Fortran:

SUBROUTINE OMP_INIT_LOCK(var)SUBROUTINE OMP_SET_LOCK(var)LOGICAL FUNCTION OMP_TEST_LOCK(var)SUBROUTINE OMP_UNSET_LOCK(var)SUBROUTINE OMP_DESTROY_LOCK(var)

• var should be an INTEGER of the same size as addresses (e.g. INTEGER*8 on 64-bit machine).

• C/C++:#include <omp.h>void omp_init_lock(omp_lock_t *lock);void omp_set_lock(omp_lock_t *lock);int omp_test_lock(omp_lock_t *lock);void omp_unset_lock(omp_lock_t *lock);void omp_destroy_lock(omp_lock_t *lock);

Lock example

call omp_init_lock(ilock)!$OMP PARALLEL SHARED(ilock) : do while (.not. omp_test_lock(ilock)) call something_else() end do call work() call omp_unset_lock(ilock) :!$OMP END PARALLEL• OMP_TEST_LOCK will set a lock if it is not set.

FLUSH directive

• Ensures that a variable is written to/read from main memory.

• Variable will be flushed out of the register file (and usually out of cache).– Also called a memory fence.

• Allows use of “normal” variables for synchronisation.

• Avoids the need for use of volatile type qualifiers in this context.

FLUSH syntax

• Fortran: !$OMP FLUSH [(list)]• C/C++: #pragma omp flush [(list)]

– list specifies a list of variables to be flushed.

• If no list is present all shared variables are flushed.

• FLUSH directives are implied by a BARRIER, at entry and exit to CRITICAL and ORDERED sections, and at the end of PARALLEL, DO/FOR, SECTIONS and SINGLE directives(except when a NOWAIT clause is present).

FLUSH example!$OMP PARALLEL PRIVATE(myid,i,neighb) : do j=1, niters do i=lb(myid),ub(myid) a(i)=( a(i+1)+a(i))*0.5 end do ndone(myid)=ndone(myid)+1!$OMP FLUSH (ndone) do while (ndone(neighb).lt.ndone(myid))!$OMP FLUSH (ndone) end do end do

May be updated on different thread. Must wait for previous iteration to finish on neighbour

Make sure write is to main memory

Make sure read is from main memory

Waits for neighbour

Choosing Synchronisation

• Use ATOMIC if possible.– Allows the most optimisation.

• If not possible, use CRITICAL– Use different names wherever possible

• If appropriate use variable flushing • As a last resort use lock routines

– Should be a rare occurrence in practice.

Practical: Molecular dynamics

• Aim: to introduce atomic updates• Code is a simple MD simulation of the melting

of solid argon.• Computation is dominated by the calculation

of force pairs in the subroutine forces.• Parallelise this routine using a DO/FOR

directive and atomic updates.– Watch out for PRIVATE and REDUCTION variables.

Practical: Image processing

• Aim: Introduction to the use of parallel DO/for loops.

• Simple image processing algorithm to reconstruct an image from an edge-detected version.

• Use parallel DO/for directives to run in parallel.

OpenMP resources

• Web siteshttp://www.openmp.org

• Official web site, including language specifications, links to compilers, tools + mailing lists

http://www.compunity.org• OpenMP community site: links, events, resources.

• Book: “Parallel programming in OpenMP”, Chandra et al., Morgan Kaufmann, ISBN 1558606718.

• PGI Users Guide on ‘iceberg’ https://iceberg.shef.ac.uk/docs/pgi52doc/pgi52ug.pdf

introduction to openmp part ii white rose grid computing training series deniz savas, alan real,...

Documents

omp end sections

omp end critical

omp end master

omp sections

omp critical

omp single

omp master

omp end single nowait