parallel programming on the sgi origin2000 with thanks to moshe goldberg, tcc and igor zacharov sgi...

59
arallel Programming on SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Post on 21-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Parallel Programming on theSGI Origin2000

With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI

Taub Computer CenterTechnion

Mar 2005

Anne Weill-Zrahia

Page 2: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Parallel Programming on the SGI Origin2000

1) Parallelization Concepts

2) SGI Computer Design

3) Efficient Scalar Design

4) Parallel Programming -OpenMP

5) Parallel Programming- MPI

Page 3: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

4) Parallel Programming-OpenMP

Page 4: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

ReadIL500

ReadIL500

IL500IL500

IL500

IL 0

IL 0

IL100

IL350

TakeIL150(WriteIL350)

TakeIL400(WriteIL100)

Limorin Haifa

Shimonin Tel Aviv

Is this your joint bank account?

IL150

IL400

IL350

Initialamount

Finalamount

Page 5: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Introduction

- Parallelization instruction to the compiler: f77 –o prog –mp prog.f Or: f77 –o prog –pfa prog.f

- Now try to understand what a compiler has to determine when deciding how to parallelize

- Note that when loosely talk about parallelization, what is meant is: “Is the program as presented here parallelizable?”

-This is an important distinction, because sometimes rewriting can transform non-parallelizable code into a parallelizable form, as we will see…

Page 6: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency types1) Iteration i depends on values calculated in the previous iteration i-1 (loop carried dependence) do i=2,n a(i) = a(i-1) cannot be parallelized enddo

2) Data dependence within single iteration (non-loop carried dependence) do i=2,n c = . . . . a(I) = . . . c . . . parallelizable enddo

3) Reduction do i=1,n s = s + x parallelizable enddo

All data dependencies in programs are variations on thesefundamental types.

Page 7: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

Question: Are the following loops parallelizable?

do i=2,n a(i) = b(i-1)enddo

do i=2,n a(i) = a(i-1)enddo

YES!

NO!

Why?

Page 8: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

do i=2,n a(i) = b(i-1)enddo

YES!

CPU1

CPU2

CPU3

A(2)=B(1)

A(3)=B(2)

A(4)=B(3)

A(5)=B(4)

A(6)=B(5)

A(7)=B(6)

cycle1 cycle2

Page 9: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

do i=2,n a(i) = a(i-1)enddo

CPU1 A(2)=A(1)

cycle1

A(3)=A(2)

cycle2

A(4)=A(3)

cycle3

Scalar (non-parallel) run:

A(5)=A(4)

cycle4

In each cycle NEW data from previous cycle is read

Page 10: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

do i=2,n a(i) = a(i-1)enddo

No!

CPU1

CPU2

CPU3

A(2)=A(1)

A(3)=A(2)

A(4)=A(3)

cycle1

Will probably readOLD data

Page 11: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysisData dependency analysis

do i=2,n a(i) = a(i-1)enddo

No!

CPU1

CPU2

CPU3

A(2)=A(1)

A(3)=A(2)

A(4)=A(3)

A(5)=A(4)

A(6)=A(5)

A(7)=A(6)

cycle1 cycle2

May read NEW data

Will probably read

OLD data

Page 12: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

Another question: Are the following loops parallelizable?

do i=3,n,2 a(i) = a(i-1)enddo

do i=1,n s = s + a(i)enddo

YES!

Depends!

Page 13: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

do i=3,n,2 a(i) = a(i-1)enddo

YES!

CPU1

CPU2

CPU3

A(3)=A(2)

A(5)=A(4)

A(7)=A(6)

A(9)=A(8)

A(11)=A(10)

A(13)=A(12)

cycle1 cycle2

Page 14: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysisData dependency analysis

do i=1,n s = s + a(i)enddo

Depends!

CPU1

CPU2

CPU3

S=S+A(1)

S=S+A(2)

S=S+A(3)

S=S+A(4)

S=S+A(5)

S=S+A(6)

cycle1 cycle2

-The value of S will be undetermined and typically it will vary from one run to the next- This bug in parallel programming is called a “race condition”

Page 15: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

What is the principle involved here?

The examples shown fall into two categories:

1) Data being read is independent of data that is written: a(i) = b(i-1) i=2,3,4. . . a(i) = a(i-1) i=3,5,7. . .

2) Data being read depends on data that is written: a(i) = a(I-1) i=2,3,4. . . s = s + a(i) i=1,2,3. . .

Page 16: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

Here is a typical situation:

Is there a data dependency in the following loop?

do i = 1,n a(i) = sin(x(i)) result = a(i) + b(i) c(i) = result * c(i)enddo

Clearly, “result” is a temporary variable that isreassigned for every iteration.

Note: “result” must be a “private” variable (this will be discussed later).

No!

Page 17: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

Here is a (slightly different) typical situation:

Is there a data dependency in the following loop?

do i = 1,n a(i) = sin(result) result = a(i) + b(i) c(i) = result * c(i)enddo

Yes!

The value of “result” is carried over from one iterationto the next.

This is the classical read/write situation but now it is somewhat hidden.

Page 18: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

do i = 1,n a(i) = sin(result(i-1)) result(i) = a(i) + b(i) c(i) = result(i) * c(i)enddo

do i = 1,n a(i) = sin(result(i-1)) result(i) = sin(result(i-1)) + b(i) c(i) = result(i) * c(i)enddo

The loop could (symbolically) be rewritten:

Now substitute the expression for a(i):

This is really of the type “a(i)=a(i-1)” !

Page 19: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

One more: Can the following loop be parallelized?

do i = 3,n a(i) = a(i-2)enddo

If this is parallelized, there will probably be differentanswers from one run to another.

Why?

Page 20: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

CPU1

CPU2

A(3)=A(1)

A(4)=A(2)

A(5)=A(3)

A(6)=A(4)

cycle1 cycle2

do i = 3,n a(i) = a(i-2)enddo

This looks like it will be safe.

Page 21: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Data dependency analysis

CPU1

CPU2

CPU3

A(3)=A(1)

A(4)=A(2)

A(5)=A(3)

cycle1

do i = 3,n a(i) = a(i-2)enddo

HOWEVER: what if there are 3 cpu’s and not 2?

In this case, a(3) isread and written intwo threads at once

Page 22: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

RISC memory levels

CPU

Main memory

Cache

Single CPU

Page 23: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

RISC memory levels

CPU

Main memory

Cache

Single CPU

Page 24: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

RISC memory levels

Main memory

Multiple CPU’s

CPU

Cache 1

CPU0

1

Cache 0

Page 25: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

RISC memory levels

Main memory

Multiple CPU’s

CPU

Cache 1

CPU0

1

Cache 0

Page 26: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Main memory

Multiple CPU’s

CPU

Cache 1

CPU0

1

Cache 0

RISC Memory Levels

Page 27: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Definition of OpenMP

- Application Program Interface (API) for Shared Memory Parallel Programming

- Directive based approach with library support

- Targets existing applications and widely used languages: Fortran API first released October 1997 C, C++ API first released October 1998

- Multi-vendor/platform support

Page 28: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Why was OpenMP developed?

- Parallel programming before OpenMP * Standards for distributed memory (MPI and PVM) * No standard for shared memory programming- Vendors had different directive-based API for SMP * SGI, Cray, Kuck&Assoc, DEC * Vendor proprietary, similar but not the same * Most were targeted at loop level parallelism- Commercial users, high end software vendors, have big investment in existing codes- End result: users wanting portability were forced to use MPI even for shared memory * This sacrifices built-in SMP hardware benefits * Requires major effort

Page 29: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

The Spread of OpenMP

Organization: Architecture review board Web site: www.openmp.org

Hardware: HP/DEC IBM Intel SGI Sun

Software: Portland (PGI) NAG Intel Kuck & Assoc (KAI) Absoft

Page 30: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP Interface model

•Control structures•Work sharing•Data scope attributes * private,firstprivate, lastprivate * shared * reduction

-Control and query * number thread * nested parallel? * throughput mode

- Lock API

-Runtime environment * schedule type * max number threads * nested parallelism * throughput mode

DirectivesAnd

Pragmas

RuntimeLibraryroutines

Environmentvariables

Page 31: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP execution model

OpenMP programs starts in a single thread, sequential mode

To create additional threads, user opens a parallel region * additional slave threads launched * master thread is part of team * threads “disappear” at the end of parallel region run

This model is repeated as needed

Master thread

Parallel:4 threads

Parallel:2 threads

Parallel:3 threads

Page 32: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Creating parallel threadsFortran

C/C++

c$omp parallel [clause,clause] code to run in parallelc$omp end parallel

#pragma omp parallel [clause,clause]{ code to run in parallel}

Replicate execution

i=0C$omp parallel call foo(i,a,b)C$omp end parallel print*,i

foo foo foo foo

i=0

print*,i

Number of threads: set in library or environment call

Page 33: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia
Page 34: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia
Page 35: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia
Page 36: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Switches, formatsf77 -mp

c$omp parallel doc$omp+shared(a,b,c)ORc$omp parallel do shared(a,b,c)

c$ iam = omp_get_thread()+1

Conditional compilation

Page 37: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000 -C

Switches, formatscc -mp

#pragma omp parallel for\shared(a,b,c)OR#pragma omp parallel for shared(a,b,c)

Page 38: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Parallel Do Directive

c$omp parallel do private(I)

c$omp end parallel do --> optional

do I=1,na(I)= I+1enddo

Topics: Clauses, Detailed construct

Page 39: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Parallel Do Directive - Clauses

sharedprivatedefault(private|shared|none)firstprivatelastprivatereduction({operator|intrinsic}:var)schedule(type,[chunk])if(scalar_logical_expression)orderedcopyin(var)

Page 40: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

S S

Single thread Parallel region Single thread

S = shared variableP = private variable

Allocating private and shared variables

Page 41: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Clauses in OpenMP - 1

Clauses for the “parallel” directive specify data association rulesand conditional computation

shared (list) - data accessible by all threads, which all refer to the same storageprivate (list) - data private to each thread - a new storage location is created with that name for each thread, and the of the storage are not available outside the parallel region

default (private | shared | none) - default association for variables not otherwise mentionedfirstprivate (list) - same as for private(list), but the contents are given an initial value from the variable with the same name, from outside the parallel regionlastprivate (list) - available only for work sharing constructs - a shared variable with that name is set to the last computed value of a thread private variable in the work sharing construct

Page 42: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Clauses in OpenMP - 2reduction ({op/intrinsic}:list) - variables in the list are named scalars of intrinsic type - a private copy of each variable will be made in each thread and initialized according to the intended operation - at the end of the parallel region or other synchronization point all private copies will be combined - the operation must be of one of the forms: x = x op expr x = intrinsic(x,expr) if (x.LT.expr) x = expr x++; x--; ++x; --x; where expr does not contain x

Op Init+ or - 0* 1& -0| 0^ 0&& 1|| 0

Op/intrinsic Init+ or - 0* 1.AND. .TRUE..OR. .FALSE..EQV. .TRUE..NEQV. .FALSE.MAX smallest numberMIN largest numberIAND all bits onIOR or IEOR 0

- example: c$omp parallel do reduction(+:a,y) reduction (.OR.:s)

Page 43: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Clauses in OpenMP - 3

copyin(list) - the list must contain common block (or global) names tahat have been declared threadprivate - data in the master thread in that common block will be copied to the thread private storage at the beginning of the parallel region - there is no “copyout” clause – data in private common block is not available outside of that threadif (scalar_logical_expression) - when an “if” clause is present, the enclosed code block is executed in parallel only if the scalar_logical_expression is .TRUE.ordered - only for do/for work sharing constructs – the code in the ORDERED block will be executed in the same sequence as sequential executionschedule (kind,[chunk]) - only for do/for work sharing constructs – specifies scheduling discipline for loop iterationsnowait - end of worksharing construct and SINGLE directive implies a synchronization\ point unless “nowait” is specified

Page 44: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Parallel Sections Directive

c$omp parallel sections private(I)

c$omp end parallel sections

c$omp section block1c$omp section block2

Topics: Clauses, Detailed construct

Page 45: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Parallel Sections Directive - Clauses

sharedprivatedefault(private|shared|none)firstprivatelastprivatereduction({operator|intrinsic}:var)if(scalar_logical_expression)copyin(var)

Page 46: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Defining a Parallel Region - Individual Do Loopsc$omp parallel shared(a,b)

do j=1,na(j)=jenddo

do k=1,nb(k)=kenddo

c$omp do private(j)

c$omp end do nowaitc$omp do private(k)

c$omp end doc$omp end parallel

Page 47: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Defining a Parallel Region - Explicit Sections

c$omp parallel shared(a,b)c$omp sectionblock1c$omp singleblock2c$omp sectionblock3c$omp end parallel

Page 48: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Synchronization Constructs

master/end mastercritical/end criticalbarrieratomicflushordered/end ordered

Page 49: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Run-Time Library Routines

Execution environment

omp_set_num_threadsomp_get_num_threadsomp_get_max_threadsomp_get_thread_numomp_get_num_procsomp_in_parallelomp_set_dynamic/omp_get_dynamicomp_set_nested/omp_get_nested

Page 50: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Run-Time Library Routines

Lock routines

omp_init_lockomp_destroy_lockomp_set_lockomp_unset_lockomp_test_lock

Page 51: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP on the Origin 2000

Environment Variables

OMP_NUM_THREADSorMP_SET_NUMTHREADSOMP_DYNAMICOMP_NESTED

Page 52: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Exercise 5 – OpenMP to parallelize a loop

Page 53: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia
Page 54: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

main loop

initial values

Page 55: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia
Page 56: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia
Page 57: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

Enhancing Performance

• Ensuring sufficient work : running a loop in parallel adds runtime costs

• Scheduling loops for load - balancing

Page 58: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

The SCHEDULE clause

SCHEDULE (TYPE[,CHUNK])

Static Each thread is assigned one chunk of iterations, according to variable or equally sized

Dynamic At runtime, chunks are assigned to threads dynamically

Page 59: Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

OpenMP summary

- Small number of compiler directives to set up parallel execution of code and runtime library system for locking function- Portable directives (supported by different vendors in the same way)- Parallelization is for SMP programming model – the machine should have a global address space- Number of execution threads is controlled outside the program- A correct OpenMP program should not depend on the exact number of execution threads nor on the scheduling mechanism for work distribution- In addition, a correct OpenMP program should be (weakly) serially equivalent – that is, the results of the computation should be within rounding accuracy when compared to sequential program- On SGI, OpenMP programming can be mixed with MPI library, so that it is possible to have “hierarchical parallelism” * OpenMP parallelism in a single node (Global Address Space) * MPI parallelism between nodes in a cluster (Network connection)