mpi application development using the analysis tool marmot · 2009-02-06 · mpi application...

44
25.06.2004 1 Höchstleistungsrechenzentrum Stuttgart Bettina Krammer MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS High Performance Computing Center Stuttgart Allmandring 30 D-70550 Stuttgart http://www.hlrs.de

Upload: others

Post on 15-Feb-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 1 Höchstleistungsrechenzentrum StuttgartBettina Krammer

MPI Application Development Using the Analysis Tool MARMOT

Bettina Krammer

HLRSHigh Performance Computing Center Stuttgart

Allmandring 30D-70550 Stuttgarthttp://www.hlrs.de

Page 2: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 2 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Overview

• General Problems of MPI Programming• Related Work• Design of MARMOT• Examples• Performance Results with Benchmarks and Real Applications• Outlook

Page 3: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 3 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Problems of MPI Programming

• All problems of serial programming• Additional problems:

– Increased difficulty to verify correctness of program– Increased difficulty to debug N parallel processes– New parallel problems (deadlock, race conditions)– Portability between different MPI implementations

PACX-MPI

Page 4: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 4 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Related Work I: Parallel Debuggers

• Examples: totalview, DDT, p2d2 • Advantages:

– Same approach and tool as in serial case• Disadvantages:

– Can only fix problems after and if they occur– Scalability: How can you debug programs that crash after 3

hours on 512 nodes?– Reproducibility: How to debug a program that crashes only

every fifth time?– It does not help to improve portability

Page 5: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 5 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Related Work II: Debug version of MPI Library

• Examples:– catches some incorrect usage: e.g. node count in

MPI_CART_CREATE (mpich)– deadlock detection (NEC mpi)

• Advantages:– good scalability– better debugging in combination with totalview

• Disadvantages:– Portability: only helps to use this implementation of MPI– Trade-of between performance and safety– Reproducibility: Does not help to debug irreproducible programs

Page 6: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 6 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Related Work III: Special Verification Tools

• Special tools dedicated to message checking, examples:• MPI-CHECK

– Restricted to Fortran code– Automatic compile-time at run-time analysis– Currently not under active development

• Umpire– Mature tool, but not freely available– First version limited to shared memory platforms– Distributed memory version in preparation

• New Approach: MARMOT

Page 7: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 7 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Design of MARMOT

Page 8: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 8 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Design of MARMOT I

• Design Goals of MARMOT:– portability

• verify that your program is a correct MPI program

– reproducibility• detect possible race conditions• pseudo-serialize the program to make it reproducible

– scalability• automatic debugging wherever possible

Page 9: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 9 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Applicationor Test Program

MPI library

MARMOT core tool

Profiling Interface

DebugServer

(additionalprocess)

Design of MARMOT II

Page 10: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 10 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Design of MARMOT III

• Library written in C++ that will be linked to the application• This library consists of the debug clients and one debug server.• No source code modification is required except for adding an

additional process working as debug server, i.e. the application will have to be run with mpirun for n+1 instead of n processes.

• Main interface = MPI profiling interface according to the MPI standard 1.2

• Implementation of C language binding of MPI• Implementation of Fortran language binding as a wrapper to the C

interface• Environment variables for tool behavior and output (report of errors,

warnings and/or remarks, trace-back, etc.)

Page 11: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 11 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Client Checks: verification on the local nodes

• Verification of MPI_Request usage– invalid recycling of active request– invalid use of unregistered request– warning if number of requests is zero– warning if all requests are MPI_REQUEST_NULL

• Verification of tag range• Verification if requested cartesian communicator has correct size• Verification of communicator in cartesian calls• Verification of groups in group calls• Verification of sizes in calls that create groups or communicators• Verification if ranges are valid (e.g. in group constructor calls)• Verification if ranges are distinct (e.g. MPI_Group_incl, -excl)• Check for pending messages and active requests in

MPI_Finalize

Page 12: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 12 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Server Checks: verification between the nodes, control of program

• Everything that requires a global view• Control the execution flow• Signal conditions, e.g. deadlocks• Check matching send/receive pairs for consistency• Output log (report errors etc.)

Page 13: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 13 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Comparison with related projects

basically anyUNIX environment

basically any UNIX environment

shared memoryplatforms

Platform

automaticcompile-time and run-time checking

automatic run-timechecking

automatic run-timechecking

Functionality

Parsing and automaticmodification of source code

use of profilinginterface, centralmanager to controlexecution and collect information,no source codemodificationrequired

use of profilinginterface, centralmanager to controlexecution and collect information,no source codemodificationrequired

Design

yesyesyesFortran90 support

yesyesyesFortran77 support

nono?C++ support (MPI 2.0)

noyesyesC support

parts of 1.2/2.0complete 1.2parts of 1.2MPI-standard

MPI-CHECKMARMOTumpireName/Features

Page 14: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 14 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Comparison with related projects

nowarnings, e.g. whencalls are usingwildcards

noDetect Race Conditions

yes,algorithm basedon handshakingstrategy, combined withuser-specifiedtime-out

yes,algorithm based on user-specified time-out mechanism,traceback on eachnode possible

yes,algorithm tries to constructdependencygraphs, combinedwith user-specifiedtime-out

Deadlock Detection

yesnoyesCheck mismatched collectiveoperations

noyesyes, partlyCheck correct handling of requests

partly (ranks, types, negative message lengths)

yes,basically all arguments arechecked, includingproper requesthandling

partly (data types, requests, communicators)

Check correct use of arguments (ranks, tags, types, communicators, groups, datatypes, user-definedresources, requests,...)

MPI-CHECKMARMOTumpireName/Features

Page 15: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 15 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Examples

Page 16: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 16 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Example 1: request-reuse (source code)

/* ** Here we re-use a request we didn't free before*/

#include <stdio.h>#include <assert.h>#include "mpi.h"

int main( int argc, char **argv ) {int size = -1;int rank = -1;int value = -1;int value2 = -1;MPI_Status send_status, recv_status;MPI_Request send_request, recv_request;

printf( "We call Irecv and Isend with non-freed requests.\n" );MPI_Init( &argc, &argv );MPI_Comm_size( MPI_COMM_WORLD, &size );MPI_Comm_rank( MPI_COMM_WORLD, &rank );printf( " I am rank %d of %d PEs\n", rank, size );

Page 17: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 17 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Example 1: request-reuse (source code continued)

if( rank == 0 ){/*** this is just to get the request used ***/MPI_Irecv( &value, 1, MPI_INT, 1, 18, MPI_COMM_WORLD, &recv_request );/*** going to receive the message and reuse a non-freed request ***/MPI_Irecv( &value, 1, MPI_INT, 1, 17, MPI_COMM_WORLD, &recv_request );MPI_Wait( &recv_request, &recv_status ); assert( value = 19 );

}if( rank == 1 ){

value2 = 19;/*** this is just to use the request ***/MPI_Isend( &value, 1, MPI_INT, 0, 18, MPI_COMM_WORLD, &send_request );/*** going to send the message ***/MPI_Isend( &value2, 1, MPI_INT, 0, 17, MPI_COMM_WORLD, &send_request );MPI_Wait( &send_request, &send_status );

}MPI_Finalize();return 0;

}

Page 18: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 18 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Example 1: request-reuse (output log)

We call Irecv and Isend with non-freed requests.1 rank 0 performs MPI_Init2 rank 1 performs MPI_Init3 rank 0 performs MPI_Comm_size4 rank 1 performs MPI_Comm_size5 rank 0 performs MPI_Comm_rank6 rank 1 performs MPI_Comm_rank

I am rank 0 of 2 PEs7 rank 0 performs MPI_Irecv

I am rank 1 of 2 PEs8 rank 1 performs MPI_Isend9 rank 0 performs MPI_Irecv10 rank 1 performs MPI_Isend

ERROR: MPI_Irecv Request is still in use !! 11 rank 0 performs MPI_Wait

ERROR: MPI_Isend Request is still in use !! 12 rank 1 performs MPI_Wait13 rank 0 performs MPI_Finalize14 rank 1 performs MPI_Finalize

Page 19: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 19 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Example 2: deadlock (source code)

/* This program produces a deadlock.** At least 2 nodes are required to run the program.**** Rank 0 recv a message from Rank 1.** Rank 1 recv a message from Rank 0.**** AFTERWARDS:** Rank 0 sends a message to Rank 1.** Rank 1 sends a message to Rank 0.*/

#include <stdio.h>#include "mpi.h"

int main( int argc, char** argv ){int rank = 0;int size = 0;int dummy = 0;MPI_Status status;

Page 20: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 20 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Example 2: deadlock (source code continued)

MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );MPI_Comm_size( MPI_COMM_WORLD, &size );if( size < 2 ){fprintf( stderr," This program needs at least 2 PEs!\n" );

}else {if ( rank == 0 ){MPI_Recv( &dummy, 1, MPI_INT, 1, 17, MPI_COMM_WORLD, &status );MPI_Send( &dummy, 1, MPI_INT, 1, 18, MPI_COMM_WORLD );

}if( rank == 1 ){MPI_Recv( &dummy, 1, MPI_INT, 0, 18, MPI_COMM_WORLD, &status );MPI_Send( &dummy, 1, MPI_INT, 0, 17, MPI_COMM_WORLD );

}}MPI_Finalize();

}

Page 21: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 21 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Example 2: deadlock (output log)

$ mpirun -np 3 deadlock1

1 rank 0 performs MPI_Init2 rank 1 performs MPI_Init3 rank 0 performs MPI_Comm_rank4 rank 1 performs MPI_Comm_rank5 rank 0 performs MPI_Comm_size6 rank 1 performs MPI_Comm_size7 rank 0 performs MPI_Recv8 rank 1 performs MPI_Recv8 Rank 0 is pending!8 Rank 1 is pending!

WARNING: deadlock detected, all clients are pending

Page 22: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 22 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Example 2: deadlock (output log continued)

Last calls (max. 10) on node 0:timestamp = 1: MPI_Init( *argc, ***argv )timestamp = 3: MPI_Comm_rank( comm, *rank )timestamp = 5: MPI_Comm_size( comm, *size )timestamp = 7: MPI_Recv( *buf, count = -1, datatype = non-predefined datatype, source = -1, tag = -1, comm, *status)

Last calls (max. 10) on node 1:timestamp = 2: MPI_Init( *argc, ***argv )timestamp = 4: MPI_Comm_rank( comm, *rank )timestamp = 6: MPI_Comm_size( comm, *size )timestamp = 8: MPI_Recv( *buf, count = -1, datatype = non-predefined datatype, source = -1, tag = -1, comm, *status )

Page 23: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 23 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Current status of MARMOT

• Full MPI 1.2 implemented• C and Fortran binding is supported• Used for several Fortran and C benchmarks (NASPB) and

applications (Crossgrid Project and others) • Tests on different platforms, using different compilers and MPI

implementations, e.g.– IA32/IA64 clusters (Intel, g++ compiler) with mpich– IBM Regatta– NEC SX5– Hitachi SR8000

Page 24: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 24 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Performance with Benchmarks

Page 25: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 25 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Bandwidth on an IA64 cluster with Myrinet

0

50

100

150

200

250

1 8 64 512

4096

3276

826

2144

2097

152

Message size [Bytes]

MB/s

native MPI

MARMOT

MARMOTserialized

Page 26: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 26 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Latency on an IA64 cluster with Myrinet

0,01

0,1

1

10

100

1 8 64 512

4096

3276

826

2144

2097

152

Message size [Bytes]

Late

ncy

[ms]

native MPI

MARMOT

MARMOTserialized

Page 27: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 27 Höchstleistungsrechenzentrum StuttgartBettina Krammer

cg.B on an IA32 cluster with Myrinet

0

500

1000

1500

2000

2500

1 2 4 8 16

Processors

Mop

s/s

tota

l native MPI

MARMOT

MARMOTserialized

Page 28: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 28 Höchstleistungsrechenzentrum StuttgartBettina Krammer

is.B on an IA32 cluster with Myrinet

020406080

100120140160180

1 2 4 8 16

Processors

Mop

s/s

tota

l native MPI

MARMOT

MARMOTserialized

Page 29: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 29 Höchstleistungsrechenzentrum StuttgartBettina Krammer

is.A on an IA32 cluster with Myrinet

020406080

100120140160180

1 2 4 8 16

Processors

Mop

s/s

tota

l native MPI

MARMOT

MARMOTserialized

Page 30: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 30 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Performance with Real Applications

Page 31: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 31 Höchstleistungsrechenzentrum StuttgartBettina Krammer

CrossGrid Application: WP 1.4: Air pollution modeling

• Air pollution modeling with STEM-II model• Transport equation solved with Petrov-

Crank-Nikolson-Galerkin method• Chemistry and Mass transfer are integrated

using semi-implicit Euler and pseudo-analytical methods

• 15500 lines of Fortran code• 12 different MPI calls:

– MPI_Init, MPI_Comm_size, MPI_Comm_rank, MPI_Type_extent, MPI_Type_struct, MPI_Type_commit, MPI_Type_hvector, MPI_Bcast, MPI_Scatterv, MPI_Barrier, MPI_Gatherv, MPI_Finalize.

Page 32: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 32 Höchstleistungsrechenzentrum StuttgartBettina Krammer

STEM application on an IA32 cluster with Myrinet

01020304050607080

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Processors

Tim

e [s

]

native MPIMARMOT

Page 33: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 33 Höchstleistungsrechenzentrum StuttgartBettina Krammer

CrossGrid Application: WP 1.3: High Energy Physics

• Filtering of real-time data with neural networks (ANN application)

• 11500 lines of C code• 11 different MPI calls:

– MPI_Init, MPI_Comm_size, MPI_Comm_rank, MPI_Get_processor_name, MPI_Barrier, MPI_Gather, MPI_Recv, MPI_Send, MPI_Bcast, MPI_Reduce, MPI_Finalize.

level 1 - special hardware

40 MHz (40 TB/sec)75 KHz (75 GB/sec)5 KHz (5 GB/sec)100 Hz(100 MB/sec)data recording &offline analysis

level 2 - embedded processors

level 3 - PCs

Page 34: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 34 Höchstleistungsrechenzentrum StuttgartBettina Krammer

HEP application on an IA32 cluster with Myrinet

050

100150200250300350400

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Processors

Tim

e [s

]

native MPIMARMOT

Page 35: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 35 Höchstleistungsrechenzentrum StuttgartBettina Krammer

CrossGrid Application: WP 1.1: Medical Application

• Calculation of blood flow with Lattice-Boltzmann method

• Stripped down application with 6500 lines of C code

• 14 different MPI calls:– MPI_Init, MPI_Comm_rank,

MPI_Comm_size, MPI_Pack, MPI_Bcast, MPI_Unpack, MPI_Cart_create, MPI_Cart_shift, MPI_Send, MPI_Recv, MPI_Barrier, MPI_Reduce, MPI_Sendrecv, MPI_Finalize

Page 36: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 36 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Medical application on an IA32 cluster with Myrinet

0

0,1

0,2

0,3

0,4

0,5

0,6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Processors

Tim

e pe

r Ite

ratio

n [s

]

native MPIMARMOT

Page 37: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 37 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Message statistics with native MPI

Page 38: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 38 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Message statistics with MARMOT

Page 39: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 39 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Medical application on an IA32 cluster with Myrinet without barrier

0

0,1

0,2

0,3

0,4

0,5

0,6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Processors

Tim

e pe

r Ite

ratio

n [s

] native MPI

MARMOT

MARMOTwithoutbarrier

Page 40: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 40 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Barrier with native MPI

Page 41: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 41 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Barrier with MARMOT

Page 42: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 42 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Feedback of Crossgrid Applications

• Task 1.1 (biomedical)– C application– Identified issues:

• Possible race conditions due to use of MPI_ANY_SOURCE• Task 1.2 (flood):

– Fortran application– Identified issues:

• Tags outside of valid range• Possible race conditions due to use of MPI_ANY_SOURCE

• Task 1.3 (hep):– ANN (C application)– no issues found by MARMOT

• Task 1.4 (meteo):– STEMII (Fortran)– MARMOT detected holes in self-defined datatypes used in

MPI_Scatterv, MPI_Gatherv. These holes were removed, which helped to improve the performance of the communication.

Page 43: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 43 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Conclusion

• MARMOT supports MPI 1.2 for C and Fortran binding• Tested successfully with several applications and platforms• Performance sufficient for many applications

• Future work:– scalability and general performance improvements– distribute tests from server to clients– better user interface to present problems and warnings– extended functionality:

• more tests to verify collective calls• MPI-2• Hybrid programming

Page 44: MPI Application Development Using the Analysis Tool MARMOT · 2009-02-06 · MPI Application Development Using the Analysis Tool MARMOT Bettina Krammer HLRS ... – Portability between

25.06.2004 44 Höchstleistungsrechenzentrum StuttgartBettina Krammer

Thanks for your attention