dhananjay brahme parallelization and optimization group...

68
Multicore Programming: Synchronization:Semaphores Dhananjay Brahme Parallelization and Optimization Group TATA Consultancy Services, SahyadriPark Pune, India April 30, 2013 TATA Consultancy Services, Experience Certainity 1 c All rights reserved

Upload: others

Post on 07-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Multicore Programming: Synchronization:Semaphores

Dhananjay BrahmeParallelization and Optimization Group

TATA Consultancy Services, SahyadriPark Pune, India

April 30, 2013

TATA Consultancy Services, Experience Certainity 1 c©All rights reserved

Page 2: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Topics

1. Synchronization Problems

2. Semaphore Definition

3. Exploration of Problem Solutions using Semaphores

Reference: The Little Book of Semaphores, Allen Downey Version 2.1.5

TATA Consultancy Services, Experience Certainity 2 c©All rights reserved

Page 3: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Topics

1. Synchronization Problems

2. Semaphore Definition

3. Exploration of Problem Solutions using Semaphores

Reference: The Little Book of Semaphores, Allen Downey Version 2.1.5

TATA Consultancy Services, Experience Certainity 2 c©All rights reserved

Page 4: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Topics

1. Synchronization Problems

2. Semaphore Definition

3. Exploration of Problem Solutions using Semaphores

Reference: The Little Book of Semaphores, Allen Downey Version 2.1.5

TATA Consultancy Services, Experience Certainity 2 c©All rights reserved

Page 5: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Syncrhonization Constraints: Serialization

Algorithm

Ram:Eats breakfastWrites Software module BSignals DoneEats Lunch

Algorithm

Laxman:Eats breakfastTests Software module AEats LunchReceives Signal to StartTests Software module B

TATA Consultancy Services, Experience Certainity 3 c©All rights reserved

Page 6: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Syncrhonization Constraints: Mutual Exclusion

Algorithm

Ram:Eats breakfastSignals Begining ModifyModifies Software module ASignals Done Modifying

Algorithm

Laxman:Eats breakfastSignals Begining ModifyModifies Software module ASignals Done Modifying

TATA Consultancy Services, Experience Certainity 4 c©All rights reserved

Page 7: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Semaphore: Definition

Semaphore is an integer with the following differences:

1. Semaphore is created and initialized to any value.

2. When a thread decrements its value, if it is negative it waits, i.e., itgets blocked.

3. When a thread increments its value, if there are waiting threads, thenone thread gets unblocked.

TATA Consultancy Services, Experience Certainity 5 c©All rights reserved

Page 8: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Serialization Problem:

A writes Software Module A B tests Software Module A

TATA Consultancy Services, Experience Certainity 6 c©All rights reserved

Page 9: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Signaling with Semaphores:

Create done(0)A writes Software Module Adone.signal

done.waitB tests Software Module A

TATA Consultancy Services, Experience Certainity 7 c©All rights reserved

Page 10: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Rendezvous:

A writes Software Module AA tests Software Module B

B writes Software Module BB tests Software Module A

TATA Consultancy Services, Experience Certainity 8 c©All rights reserved

Page 11: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Rendezvous: Solution with Semaphores:

Thread 1:Create doneA(0)A writes Software Module AdoneA.signalwaitB.signalA tests Software Module B

Thread 2:Create doneB(0)B writes Software Module BdoneB.signalwaitA.signalB tests Software Module A

Potential Problems?

TATA Consultancy Services, Experience Certainity 9 c©All rights reserved

Page 12: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Mutual Exclusion:

Thread 1:count = count++;

Thread 2:count = count++;

TATA Consultancy Services, Experience Certainity 10 c©All rights reserved

Page 13: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Mutual Exclusion: Solution with Semaphore

Create update(1)

Thread 1:update.wait()count = count++;update.signal()

Thread 2:update.wait()count = count++;update.signal()

TATA Consultancy Services, Experience Certainity 11 c©All rights reserved

Page 14: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Barrier: Rendezvous With n threads

Thread i:

1. rendezvous

2. critical section

TATA Consultancy Services, Experience Certainity 12 c©All rights reserved

Page 15: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Barrier: Solution 1 with Semaphores

Thread 1: create b(-n)Thread i:

1. b.signal

2. b.wait

3. barrier point

Problem ??

TATA Consultancy Services, Experience Certainity 13 c©All rights reserved

Page 16: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Barrier: Solution 1 with SemaphoresThread 1: count = 0create update count(1)create b(0)if (count == n) b.signalThread i:

1. update count.wait

2. count++;

3. update count.signal

4. if (count == n) b.signal

5. b.wait

6. barrier point

Problem ??

Is there always a problem

TATA Consultancy Services, Experience Certainity 14 c©All rights reserved

Page 17: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Implement Barrier using Semaphore and OpenMP

1. Implement using “semaphore.h” and openMP “omp.h”.

2. YOUR SOLUTION HERE.

TATA Consultancy Services, Experience Certainity 15 c©All rights reserved

Page 18: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Barrier using semaphores#inc l u d e <s t d i o . h>

#inc l u d e <s t d l i b . h>

#inc l u d e <semaphore . h>

#inc l u d e <omp . h>

i n t main ( i n t argc , char ∗∗a rgv ){sem t b ;i n t count =0;s em i n i t (&b , 1 , 0 ) ;

#pragma omp p a r a l l e l num threads (3 )

{s l e e p ( omp get thread num ( ) ) ;

#pragma omp c r i t i c a l

{count++;

}i f ( count == 3){

sem post (&b ) ;p r i n t f ( ” th r ead=%d a r r i v e d at b a r r i e r \n” , omp get thread num ( ) ) ;

} e l s e i f ( count < 3){p r i n t f ( ”Wating I n th r ead %d\n” , omp get thread num ( ) ) ;sem wai t (&b ) ;p r i n t f ( ”Awakened th r ead=%d\n” , omp get thread num ( ) ) ;sem post (&b ) ;

}}r e t u r n 0 ;

}

TATA Consultancy Services, Experience Certainity 16 c©All rights reserved

Page 19: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Multicore Programming

Parallelization and Optimization GroupTATA Consultancy Services, Sahyadri Park Pune, India

May 3, 2013

TATA Consultancy Services, Experience Certainity 1 c©All rights reserved

Page 20: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Agenda

1. Introduction to Multithreading

2. About Pthreads

3. Pthreads - An Example

4. Introduction to OpenMP

5. Programming Model

6. OpenMP Example

7. Data Parallelism

8. Task Parallelism

9. Synchronization Issues

10. Explicit Asynchronous/Synchronous Execution

11. Load balancing in Loops

TATA Consultancy Services, Experience Certainity 2 c©All rights reserved

Page 21: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Introduction to Multithreading

1. Multiple threads exists within the context of a single process.

2. Can share resources such as memory, not applicable with processes.

3. Single-core system: More Context switching time.

4. Multi-core system: Less switching, truly concurrent.

TATA Consultancy Services, Experience Certainity 3 c©All rights reserved

Page 22: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

About Pthreads1. Standard C library with functions for using threads.2. Available across different platforms.

Figure: Creation of Multi-threads

TATA Consultancy Services, Experience Certainity 4 c©All rights reserved

Page 23: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

About Pthreads1. Standard C library with functions for using threads.2. Available across different platforms.

Figure: Creation of Multi-threads

TATA Consultancy Services, Experience Certainity 4 c©All rights reserved

Page 24: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Example : Vector Addition

In main():

p t h r e a d t c a l l Th r e a d [numT ] ;

f o r ( i =0; i<numT; i++)p t h r e a d c r e a t e (& ca l lTh r e a d [ i ] ,NULL , vectAdd , ( vo id ∗) i ) ;

f o r ( i =0; i<numT; i++)p t h r e a d j o i n (& ca l lTh r e a d [ i ] , NULL) ;

TATA Consultancy Services, Experience Certainity 5 c©All rights reserved

Page 25: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Example : Vector Addition

In vectAdd() :

s t a r t = t i d ∗ (N/numT) ;end = ( t i d +1) ∗ (N/numT) ;

f o r ( i=s t a r t ; i< end ; i++)C [ i ]= A[ i ] + B[ i ] ;

TATA Consultancy Services, Experience Certainity 6 c©All rights reserved

Page 26: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Compilation :

1. Include pthread.h in the main file

2. Compile program with -lpthread

I gcc -o test test.c -lpthread

TATA Consultancy Services, Experience Certainity 7 c©All rights reserved

Page 27: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Compilation :

1. Include pthread.h in the main file

2. Compile program with

-lpthread

I gcc -o test test.c -lpthread

TATA Consultancy Services, Experience Certainity 7 c©All rights reserved

Page 28: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Compilation :

1. Include pthread.h in the main file

2. Compile program with -lpthread

I gcc -o test test.c -lpthread

TATA Consultancy Services, Experience Certainity 7 c©All rights reserved

Page 29: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Compilation :

1. Include pthread.h in the main file

2. Compile program with -lpthread

I gcc -o test test.c -lpthread

TATA Consultancy Services, Experience Certainity 7 c©All rights reserved

Page 30: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

About OpenMP

1. Abbreviation: Open specifications for Multi-Processing.

2. API to exhibit multi-threaded shared memory parallelism.

3. The API is specified for C/C++ and Fortran

4. Three distinct components. As of version 3.1:

4.1 Compiler Directives (20)4.2 Runtime Library Routines (32)4.3 Environment Variables (9)

TATA Consultancy Services, Experience Certainity 8 c©All rights reserved

Page 31: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

OpenMP Programming Model

#inc l u d e <omp . h>..

#pragma omp p a r a l l e l{

// P a r a l l e l Region

}

Figure: Creation of Multi-threads

TATA Consultancy Services, Experience Certainity 9 c©All rights reserved

Page 32: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

OpenMP Example

#inc l u d e <s t d i o . h>#inc l u d e <omp . h>

i n t main ( ){

#pragma omp p a r a l l e l num threads (4 )

{p r i n t f ( ” He l l o ! My Thread I d i s : : %d\n” , omp get thread num ( ) ) ;

}

r e t u r n 0 ;}

Compile (GNU C Compiler): gcc -fopenmp hello.c -o hello.outRun: ./hello.out

TATA Consultancy Services, Experience Certainity 10 c©All rights reserved

Page 33: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

OpenMP Example

#inc l u d e <s t d i o . h>#inc l u d e <omp . h>

i n t main ( ){

#pragma omp p a r a l l e l num threads (4 )

{p r i n t f ( ” He l l o ! My Thread I d i s : : %d\n” , omp get thread num ( ) ) ;

}

r e t u r n 0 ;}

Compile (GNU C Compiler): gcc -fopenmp hello.c -o hello.outRun: ./hello.out

TATA Consultancy Services, Experience Certainity 10 c©All rights reserved

Page 34: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

OpenMP Example

Output:Hello! My Thread Id is :: 0Hello! My Thread Id is :: 3Hello! My Thread Id is :: 1Hello! My Thread Id is :: 2

TATA Consultancy Services, Experience Certainity 11 c©All rights reserved

Page 35: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Setting the Number of Threads

Figure: Ways to Set Threads

TATA Consultancy Services, Experience Certainity 12 c©All rights reserved

Page 36: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Bigger Data

Figure: Single Thread Operation (Sequential)

Will take longer execution time.

TATA Consultancy Services, Experience Certainity 13 c©All rights reserved

Page 37: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Chunk your Data, Share work among threads

Figure: MultiThread Operation (Parallel)

Will take less execution time.

TATA Consultancy Services, Experience Certainity 14 c©All rights reserved

Page 38: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

OpenMP For directive

#pragma omp p a r a l l e l num threads (4 )#pragma omp f o r

f o r ( i =0; i <4; i++){

p r i n t f ( ” I t e r a t i o n %d , ThreadId %d\n” , i , omp get thread num ( ) ) ;}

Output:Iteration 0, ThreadId 0Iteration 2, ThreadId 2Iteration 3, ThreadId 3Iteration 1, ThreadId 1

TATA Consultancy Services, Experience Certainity 15 c©All rights reserved

Page 39: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

OpenMP For directive

#pragma omp p a r a l l e l num threads (4 )#pragma omp f o r

f o r ( i =0; i <4; i++){

p r i n t f ( ” I t e r a t i o n %d , ThreadId %d\n” , i , omp get thread num ( ) ) ;}

Output:Iteration 0, ThreadId 0Iteration 2, ThreadId 2Iteration 3, ThreadId 3Iteration 1, ThreadId 1

TATA Consultancy Services, Experience Certainity 15 c©All rights reserved

Page 40: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Shared Data

Figure: Globally accessible Data

Simultaneous Write Operations can do hazards.

TATA Consultancy Services, Experience Certainity 16 c©All rights reserved

Page 41: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Make your Data Private

Creates private copy of variable for each thread.

i n t a = 2 ;

#pragma omp p a r a l l e l p r i v a t e ( a ){

p r i n t f ( ”Value o f a : : %d\n” , a ) ;}

Figure: Private Clause

Problem: Variables uninitialized for each thread

TATA Consultancy Services, Experience Certainity 17 c©All rights reserved

Page 42: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Make your Data Private

Creates private copy of variable for each thread.

i n t a = 2 ;

#pragma omp p a r a l l e l p r i v a t e ( a ){

p r i n t f ( ”Value o f a : : %d\n” , a ) ;}

Figure: Private Clause

Problem: Variables uninitialized for each thread

TATA Consultancy Services, Experience Certainity 17 c©All rights reserved

Page 43: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Firstprivate ClauseCreates private copy of variable for each thread with automaticinitialization.

i n t a = 2 ;

#pragma omp p a r a l l e l f i r s t p r i v a t e ( a ){

p r i n t f ( ”Value o f a : : %d\n” , a ) ;}

Figure: Firstprivate Clause

TATA Consultancy Services, Experience Certainity 18 c©All rights reserved

Page 44: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Firstprivate ClauseCreates private copy of variable for each thread with automaticinitialization.

i n t a = 2 ;

#pragma omp p a r a l l e l f i r s t p r i v a t e ( a ){

p r i n t f ( ”Value o f a : : %d\n” , a ) ;}

Figure: Firstprivate Clause

TATA Consultancy Services, Experience Certainity 18 c©All rights reserved

Page 45: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Lastprivate Clause

Private + Copy back value from last loop iteration to the original variable.

i n t i = 0 , a = 2 ;

#pragma omp p a r a l l e l f i r s t p r i v a t e ( a ) l a s t p r i v a t e ( a )#pragma omp f o r

f o r ( i =0; i <4; i++){

a += omp get thread num ( ) ;}

p r i n t f ( ”Value o f a : : %d\n” , a ) ;

Figure: Lastprivate Clause

TATA Consultancy Services, Experience Certainity 19 c©All rights reserved

Page 46: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Lastprivate Clause

Private + Copy back value from last loop iteration to the original variable.

i n t i = 0 , a = 2 ;

#pragma omp p a r a l l e l f i r s t p r i v a t e ( a ) l a s t p r i v a t e ( a )#pragma omp f o r

f o r ( i =0; i <4; i++){

a += omp get thread num ( ) ;}

p r i n t f ( ”Value o f a : : %d\n” , a ) ;

Figure: Lastprivate Clause

TATA Consultancy Services, Experience Certainity 19 c©All rights reserved

Page 47: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Reduction Clause1. A private copy for each list variable is created for each thread.2. Reduction variable is applied to all private copies of the shared

variable.3. Final result is written to the global shared variable.

i n t i , sum=0;i n t a [ 2 ] = {1 ,1} , b [ 2 ] = {2 ,2} ;

#pragma omp p a r a l l e l r e d u c t i o n (+:sum) num threads (2 )#pragma omp f o r

f o r ( i =0; i <2; i++){

sum = a [ i ]+b [ i ] ;}

p r i n t f ( ”Sum : : %d\n” , sum) ;

Figure: Reduction Clause

TATA Consultancy Services, Experience Certainity 20 c©All rights reserved

Page 48: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Master Directive

Code is to be executed by only Master thread.

i n t a [ 4 ] , i =0;

#pragma omp p a r a l l e l{

// Computation#pragma omp f o rf o r ( i = 0 ; i < 4 ; i++){

a [ i ] = i ∗ i ;}

// P r i n t R e s u l t s#pragma omp master{

f o r ( i = 0 ; i < 4 ; i++)p r i n t f ( ”a[%d ] = %d\n” , i , a [ i ] ) ;

}}

Output:a[0] = 0a[1] = 1a[2] = 4a[3] = 9

TATA Consultancy Services, Experience Certainity 21 c©All rights reserved

Page 49: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Master Directive

Code is to be executed by only Master thread.

i n t a [ 4 ] , i =0;

#pragma omp p a r a l l e l{

// Computation#pragma omp f o rf o r ( i = 0 ; i < 4 ; i++){

a [ i ] = i ∗ i ;}

// P r i n t R e s u l t s#pragma omp master{

f o r ( i = 0 ; i < 4 ; i++)p r i n t f ( ”a[%d ] = %d\n” , i , a [ i ] ) ;

}}

Output:a[0] = 0a[1] = 1a[2] = 4a[3] = 9

TATA Consultancy Services, Experience Certainity 21 c©All rights reserved

Page 50: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Single Directive

Code is to be executed by only one thread in the team.

i n t a [ 4 ] , i =0;

#pragma omp p a r a l l e l{

// Computation#pragma omp f o rf o r ( i = 0 ; i < 4 ; i++){

a [ i ] = i ∗ i ;}

// P r i n t R e s u l t s#pragma omp s i n g l e{

f o r ( i = 0 ; i < 4 ; i++)p r i n t f ( ”a[%d ] = %d\n” , i , a [ i ] ) ;

}}

Output:a[0] = 0a[1] = 1a[2] = 4a[3] = 9

TATA Consultancy Services, Experience Certainity 22 c©All rights reserved

Page 51: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Single Directive

Code is to be executed by only one thread in the team.

i n t a [ 4 ] , i =0;

#pragma omp p a r a l l e l{

// Computation#pragma omp f o rf o r ( i = 0 ; i < 4 ; i++){

a [ i ] = i ∗ i ;}

// P r i n t R e s u l t s#pragma omp s i n g l e{

f o r ( i = 0 ; i < 4 ; i++)p r i n t f ( ”a[%d ] = %d\n” , i , a [ i ] ) ;

}}

Output:a[0] = 0a[1] = 1a[2] = 4a[3] = 9

TATA Consultancy Services, Experience Certainity 22 c©All rights reserved

Page 52: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Independent tasks

Figure: Single Thread Operation (Sequential)

Will take Longer Execution Time.

TATA Consultancy Services, Experience Certainity 23 c©All rights reserved

Page 53: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Independent Tasks can run in parallel

Figure: MultiThread Operation (Parallel)

Will take Less Execution Time.

TATA Consultancy Services, Experience Certainity 24 c©All rights reserved

Page 54: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

OpenMP Sections Directive

#pragma omp p a r a l l e l s e c t i o n s num threads (2 ){

#pragma omp s e c t i o n{

p r i n t f ( ”A . My ThreadId : : %d\n” , omp get thread num ( ) ) ;}

#pragma omp s e c t i o n{

p r i n t f ( ”B . My ThreadId : : %d\n” , omp get thread num ( ) ) ;}

}

Output:A. My ThreadId :: 0B. My ThreadId :: 1

TATA Consultancy Services, Experience Certainity 25 c©All rights reserved

Page 55: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Synchronization Issues

What will be the output.. ??

i n t max = 11 ;i n t a [ 2 ] = {22 ,33} ;i n t t i d = 0 ;

#pragma omp p a r a l l e l{

t i d = omp get thread num ( ) ;

i f ( a [ t i d ] > max)max = a [ t i d ] ;

}

p r i n t f ( ”Value o f Max : : %d\n” , max) ;

Figure: Race Condition

Output:Value of Max :: 33 OR Value of Max :: 22

TATA Consultancy Services, Experience Certainity 26 c©All rights reserved

Page 56: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Synchronization Issues

What will be the output.. ??

i n t max = 11 ;i n t a [ 2 ] = {22 ,33} ;i n t t i d = 0 ;

#pragma omp p a r a l l e l{

t i d = omp get thread num ( ) ;

i f ( a [ t i d ] > max)max = a [ t i d ] ;

}

p r i n t f ( ”Value o f Max : : %d\n” , max) ;

Figure: Race Condition

Output:Value of Max :: 33 OR Value of Max :: 22

TATA Consultancy Services, Experience Certainity 26 c©All rights reserved

Page 57: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Use Critical Directive

i n t max = 11 ;i n t a [ 2 ] = {22 ,33} ;i n t t i d = 0 ;

#pragma omp p a r a l l e l{

#pragma omp c r i t i c a l{

t i d = omp get thread num ( ) ;i f ( a [ t i d ] > max)

max = a [ t i d ] ;}

}

p r i n t f ( ”Value o f Max : : %d\n” , max) ;

Figure: Critical Region

Output: Value of Max :: 33

TATA Consultancy Services, Experience Certainity 27 c©All rights reserved

Page 58: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Use Critical Directive

i n t max = 11 ;i n t a [ 2 ] = {22 ,33} ;i n t t i d = 0 ;

#pragma omp p a r a l l e l{

#pragma omp c r i t i c a l{

t i d = omp get thread num ( ) ;i f ( a [ t i d ] > max)

max = a [ t i d ] ;}

}

p r i n t f ( ”Value o f Max : : %d\n” , max) ;

Figure: Critical Region

Output: Value of Max :: 33

TATA Consultancy Services, Experience Certainity 27 c©All rights reserved

Page 59: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Synchronization Issue

What will be the output.. ??

#pragma omp p a r a l l e l{

#pragma omp f o r // Computation 1f o r ( i = 0 ; i < 4 ; i++)

a [ i ] = i ∗ i ;

#pragma omp master // P r i n t I n t e rmed i a t e R e s u l t s .f o r ( i = 0 ; i < 4 ; i++)

p r i n t f ( ” P a r t i a l a[%d ] = %d\n” , i , a [ i ] ) ;

#pragma omp f o r //Computation 2f o r ( i = 0 ; i < 4 ; i++)

a [ i ] += i ;}

f o r ( i = 0 ; i < 4 ; i++)p r i n t f ( ” F i n a l a[%d ] = %d\n” , i , a [ i ] ) ;

TATA Consultancy Services, Experience Certainity 28 c©All rights reserved

Page 60: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Synchronization Issue

Expected Output:Partial a[0] = 0Partial a[1] = 1Partial a[2] = 4Partial a[3] = 9Final a[0] = 0Final a[1] = 2Final a[2] = 6Final a[3] = 12

Actual Output:Partial a[0] = 0Partial a[1] = 2Partial a[2] = 6Partial a[3] = 12Final a[0] = 0Final a[1] = 2Final a[2] = 6Final a[3] = 12

TATA Consultancy Services, Experience Certainity 29 c©All rights reserved

Page 61: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Synchronization Issue

Expected Output:Partial a[0] = 0Partial a[1] = 1Partial a[2] = 4Partial a[3] = 9Final a[0] = 0Final a[1] = 2Final a[2] = 6Final a[3] = 12

Actual Output:Partial a[0] = 0Partial a[1] = 2Partial a[2] = 6Partial a[3] = 12Final a[0] = 0Final a[1] = 2Final a[2] = 6Final a[3] = 12

TATA Consultancy Services, Experience Certainity 29 c©All rights reserved

Page 62: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Explicit Synchronous Execution

Use OpenMP Barrier Directive:

#pragma omp p a r a l l e l{

#pragma omp f o r // Computation 1f o r ( i = 0 ; i < 4 ; i++)

a [ i ] = i ∗ i ;

#pragma omp master // P r i n t I n t e rmed i a t e R e s u l t s .f o r ( i = 0 ; i < 4 ; i++)

p r i n t f ( ” P a r t i a l a[%d ] = %d\n” , i , a [ i ] ) ;

#pragma omp b a r r i e r

#pragma omp f o r //Computation 2f o r ( i = 0 ; i < 4 ; i++)

a [ i ] += i ;}

f o r ( i = 0 ; i < 4 ; i++)p r i n t f ( ” F i n a l a[%d ] = %d\n” , i , a [ i ] ) ;

TATA Consultancy Services, Experience Certainity 30 c©All rights reserved

Page 63: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Explicit Synchronous Execution

Expected Output:Partial a[0] = 0Partial a[1] = 1Partial a[2] = 4Partial a[3] = 9Final a[0] = 0Final a[1] = 2Final a[2] = 6Final a[3] = 12

Actual Output:Partial a[0] = 0Partial a[1] = 1Partial a[2] = 4Partial a[3] = 9Final a[0] = 0Final a[1] = 2Final a[2] = 6Final a[3] = 12

TATA Consultancy Services, Experience Certainity 31 c©All rights reserved

Page 64: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Explicit Synchronous Execution

Expected Output:Partial a[0] = 0Partial a[1] = 1Partial a[2] = 4Partial a[3] = 9Final a[0] = 0Final a[1] = 2Final a[2] = 6Final a[3] = 12

Actual Output:Partial a[0] = 0Partial a[1] = 1Partial a[2] = 4Partial a[3] = 9Final a[0] = 0Final a[1] = 2Final a[2] = 6Final a[3] = 12

TATA Consultancy Services, Experience Certainity 31 c©All rights reserved

Page 65: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Explicit Asynchronous ExecutionIs someone sitting Ideal.. ??

Figure: Two Independent Parallel Regions

TATA Consultancy Services, Experience Certainity 32 c©All rights reserved

Page 66: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Explicit Asynchronous ExecutionCan execute asynchronously.

Figure: Use ”nowait”

TATA Consultancy Services, Experience Certainity 33 c©All rights reserved

Page 67: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Load balancing in Loops

1. Division of loop iterations among the threads in the team.

2. By default, schedule is implementation dependent.

3. Use OpenMP schedule clause with for construct.

schedule (static [, chunk]): Static load balancing by iterations.

#pragma omp p a r a l l e l f o r s c h edu l e ( s t a t i c )#pragma omp p a r a l l e l f o r s c h edu l e ( s t a t i c , 2)

schedule (dynamic [, chunk]): Dynamic load balancing.

#pragma omp p a r a l l e l f o r s c h edu l e ( dynamic )#pragma omp p a r a l l e l f o r s c h edu l e ( dynamic , 2)

TATA Consultancy Services, Experience Certainity 34 c©All rights reserved

Page 68: Dhananjay Brahme Parallelization and Optimization Group ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingmcore.pdf · Multicore Programming: Synchronization:Semaphores Dhananjay

Thank You

TATA Consultancy Services, Experience Certainity 35 c©All rights reserved