matlab*p: architecture

56
MATLAB*P: Architecture Ron Choy, Alan Edelman Laboratory for Computer Science MIT

Upload: skah

Post on 18-Jan-2016

37 views

Category:

Documents


4 download

DESCRIPTION

MATLAB*P: Architecture. Ron Choy, Alan Edelman Laboratory for Computer Science MIT. Supercomputing in 2003. The impact of the Earth Simulator The impact of the internet bubble The few big players left The rise of clusters. The Rise of Clusters. Cheap and everyone seems to have one. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MATLAB*P: Architecture

MATLAB*P: Architecture

Ron Choy, Alan EdelmanLaboratory for Computer Science

MIT

Page 2: MATLAB*P: Architecture

Supercomputing in 2003

• The impact of the Earth Simulator

• The impact of the internet bubble

• The few big players left

• The rise of clusters

Page 3: MATLAB*P: Architecture

The Rise of Clusters

• Cheap and everyone seems to have one

Page 4: MATLAB*P: Architecture

Some MIT Clusters

• Cheap & everyone seems to have one

Page 5: MATLAB*P: Architecture

Rise of Clusters

• One to Three Beowulfs is/are in the top 5 of the top500 list!

• Increased computing power does not necessarily translate to increased productivity

• Someone has to program these parallel machines, and this can be hard

Page 6: MATLAB*P: Architecture

Classes of interesting problems

• Large collection of small problems – easy, usually involves running a large number of serial programs – embarrassingly parallel

• Large problems – might not even fit into the memory of a machine. Much harder since it usually involves communication between processes

Page 7: MATLAB*P: Architecture

Parallel programming (1990 – now)

• Dominant model: MPI (Message Passing Interface)

• Low level control of communications between processes

• Send, Recv, Scatter, Gather …

Page 8: MATLAB*P: Architecture

MPI

• It is not an easy way to do parallel programming.

• Error prone – it is hard to get complex low level communications right

• Hard to debug – MPI programs tend to die with obscure error messages. There is also no good debugger for MPI. (Totalview? Multiple GDBs?)

Page 9: MATLAB*P: Architecture

C with MPI#include <stdio.h>#include <stdlib.h>#include <math.h>#include "mpi.h"#define A(i,j) ( 1.0/((1.0*(i)+(j))*(1.0*(i)+(j)+1)/2 + (1.0*(i)+1)) )void errorExit(void);double normalize(double* x, int mat_size);int main(int argc, char **argv){ int num_procs; int rank; int mat_size = 64000; int num_components; double *x = NULL; double *y_local = NULL; double norm_old = 1; double norm = 0; int i,j; int count; if (MPI_SUCCESS != MPI_Init(&argc, &argv)) exit(1); if (MPI_SUCCESS != MPI_Comm_size(MPI_COMM_WORLD,&num_procs))

errorExit();

Page 10: MATLAB*P: Architecture

C with MPI (2)if (0 == mat_size % num_procs) num_components = mat_size/num_procs; else num_components = (mat_size/num_procs + 1); mat_size = num_components * num_procs; if (0 == rank) printf("Matrix Size = %d\n", mat_size); if (0 == rank) printf("Num Components = %d\n", num_components); if (0 == rank) printf("Num Processes = %d\n", num_procs); x = (double*) malloc(mat_size * sizeof(double)); y_local = (double*) malloc(num_components * sizeof(double)); if ( (NULL == x) || (NULL == y_local) ) { free(x); free(y_local); errorExit(); } if (0 == rank) { for (i=0; i<mat_size; i++) { x[i] = rand(); } norm = normalize(x,mat_size); }

Page 11: MATLAB*P: Architecture

C with MPI (3)if (MPI_SUCCESS != MPI_Bcast(x, mat_size, MPI_DOUBLE, 0, MPI_COMM_WORLD))

errorExit(); count = 0; while (fabs(norm-norm_old) > TOL) { count++; norm_old = norm; for (i=0; i<num_components; i++) { y_local[i] = 0; } for (i=0; i<num_components && (i+num_components*rank)<mat_size; i+

+) { for (j=mat_size-1; j>=0; j--) { y_local[i] += A(i+rank*num_components,j) * x[j]; } } if (MPI_SUCCESS != MPI_Allgather(y_local, num_components,

MPI_DOUBLE, x, num_components, MPI_DOUBLE, MPI_COMM_WORLD)) errorExit();

Page 12: MATLAB*P: Architecture

C with MPI (4) norm = normalize(x, mat_size); } if (0 == rank) { printf("result = %16.15e\n", norm); } free(x); free(y_local);

MPI_Finalize(); exit(0);}void errorExit(void){ int rank; MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf("%d died\n",rank); MPI_Finalize(); exit(1);}

Page 13: MATLAB*P: Architecture

C with MPI (5)double normalize(double* x, int mat_size)

{

int i;

double norm = 0;

for (i=mat_size-1; i>=0; i--)

{

norm += x[i] * x[i];

}

norm = sqrt(norm);

for (i=0; i<mat_size; i++)

{

x[i] /= norm;

}

return norm;

}

Page 14: MATLAB*P: Architecture

The result?

• A lot of researchers in the sciences need the power of parallel computing, but do not have the expertise to tackle the programming

• Time saved by using parallel computers is offset by the additional time needed to program them

Page 15: MATLAB*P: Architecture

The result? (2)

• Also the parallel programs written might not be efficient! Writing good parallel programs is hard

• A high level tool is needed to hide the complexity away from the users of parallel computers

Page 16: MATLAB*P: Architecture

Let’s go back to the serial world

for a moment

Page 17: MATLAB*P: Architecture

What do people want?

• Real applications programmers care less about “p-fold speedups”

• Ease of programming, debugging, correct answer matters much more

Page 18: MATLAB*P: Architecture

MATLAB

• MATrix LABoratory

• Started out as an interactive frontend to LINPACK and EISPACK

• Has since grown into a powerful computing environment with ‘Toolboxes’ ranging from neural networks to computational finance to image processing.

• Many current users know nothing/little about linear algebra (Toolbox users)

Page 19: MATLAB*P: Architecture

MATLAB ‘language’

• MATLAB has a C-like scripting language, with HPF-like array indexing

• Interpreted language

• A wide range of linear algebra routines

• Very popular in the scientific community as a prototyping tool – easier to use than C/FORTRAN

Page 20: MATLAB*P: Architecture

MATLAB ‘language’ (2)

• MATLAB makes use of ATLAS (optimized BLAS) for basic linear algebra routines

• Still, performance is not as good as C/FORTRAN, especially for loops

• However MATLAB is catching up with its release 13 – (just-in-time compiling?)

Page 21: MATLAB*P: Architecture

Why people like MATLAB?

• MATLAB made good choices on numerical libraries that are efficient, and combined them into one interactive, easy to use package.

• Convenience, not performance!

Page 22: MATLAB*P: Architecture

Wouldn’t it be nice to mixthe convenience of MATLAB

withparallel computing?

Page 23: MATLAB*P: Architecture

Parallel MATLAB Survey

• http://theory.lcs.mit.edu/~cly/survey.html

• 27 parallel MATLAB projects found, using 4 approaches– Embarrassingly Parallel– Message Passing– Backend Support– Compiler

Page 24: MATLAB*P: Architecture

CMTMDP-ToolboxMatlabMPIMATmarksMPITB/PVMTBMultiMATLABParallel Toolbox for MATLAB

DistributePPMATLAB Parallelization ToolkitMULTI ToolboxParalizeParmatlabPlab

PMI

DlabParamatPLAPACKMatparMATLAB*PNetsolve

CONLAB CompilerFALCONMATCHMenhirOtterParALRTExpress

Parallel MATLABs

Page 25: MATLAB*P: Architecture

CornellRostock, GermanyLincoln LabsUIUCGranada, SpainCornellWake Forest

WUStLLinkopings, SwedenPurdueChalmers, SwedenNortheasternTUDenmarkLucent

UIUCAlpha Data UKUTAustinJPLMITUTK

Umea, SwedenUIUCAccelchipIRISA, FranceOSUSydney, AustraliaISI

Parallel MATLABs

Page 26: MATLAB*P: Architecture

CMTMDP-ToolboxMatlabMPIMATmarksMPITB/PVMTBMultiMATLABParallel Toolbox for MATLAB

DistributePPMATLAB Parallelization ToolkitMULTI ToolboxParalizeParmatlabPlab

PMI

DlabParamatPLAPACKMatparMATLAB*PNetsolve

CONLAB CompilerFALCONMATCHMenhirOtterParALRTExpress

Parallel MATLABs

Page 27: MATLAB*P: Architecture

CMTMDP-ToolboxMatlabMPIMATmarksMPITB/PVMTBMultiMATLABParallel Toolbox for MATLAB

DistributePPMATLAB Parallelization ToolkitMULTI ToolboxParalizeParmatlabPlab

PMI

DlabParamatPLAPACKMatparMATLAB*PNetsolve

CONLAB CompilerFALCONMATCHMenhirOtterParALRTExpress

Parallel MATLABs

Page 28: MATLAB*P: Architecture

Embarrassingly Parallel

• Multiple MATLABs needed

• Scatter/Gather at the beginning and end

• No lateral communication

Page 29: MATLAB*P: Architecture

Message Passing

• Multiple MATLABs required

• MATLABs talk to each other in a general MPI fashion

• Superset of embarrassingly parallel

Page 30: MATLAB*P: Architecture

Backend support

• Requires 1 MATLAB

• MATLAB is used as front end to a parallel computation engine

Page 31: MATLAB*P: Architecture

Compiler

Page 32: MATLAB*P: Architecture

Compiler (2)

• Requires zero to one MATLAB

• Compiles MATLAB scripts into native code and link with parallel numerical libraries

Page 33: MATLAB*P: Architecture

MATLAB*P

• Project started by P. Husbands, who is now at NERSC

• Goal: To provide a user-friendly tool for parallel computing

• Uses MATLAB as a front end to a parallel linear algebra server

• Encompasses the backend support, message passing and embarrassingly parallel approaches

Page 34: MATLAB*P: Architecture

MATLAB*P (2)

• Server leverages on popular parallel numerical libraries:– ScaLAPACK– FFTW– PARPACK– …

Page 35: MATLAB*P: Architecture

Matlab*P

A = rand(4000*p, 4000*p);

x = randn(4000*p, 1);

y = zeros(size(x));

while norm(x-y) / norm(x) > 1e-11

y = x;

x = A*x;

x = x / norm(x);

end;

Page 36: MATLAB*P: Architecture

Structure of the System

Package Manager

MatrixManager

Proxy

Client

MATLABMatrix 1

Matrix 2

Matrix 3

PBlas FFTW Scalapack PBase

Server #0

Client Manager

Server #1 Server #2 Server #3 Server #4

ServerManager

Page 37: MATLAB*P: Architecture

Structure of the system (2)

• Client (C++) – Attaches to MATLAB through MEX interface– Relays commands to server and process returns

• Server (C++/MPI)– Manipulates/processes data (matrices)– Performs the actual parallel computation by

calling the appropriate library routines

Page 38: MATLAB*P: Architecture

Functionalities provided

• PBLAS, ScaLAPACK– solve, eig, svd, chol, matmul, +, -, …

• FFTW– 1D and 2D FFT

• PARPACK– eigs, svds

• Support provided for s,d,c,z

Page 39: MATLAB*P: Architecture

Focus

• Requires minimal learning on user’s part

• Reuses of existing scripts

• Mimics MATLAB behaviour

• Data stay on backend until explicitly retrieved by user

• Extendable backend

Page 40: MATLAB*P: Architecture

Comparison

MATLAB MATLAB*P

Frontend to LINPACK, EISPACK

Frontend to PBLAS, ScaLAPACK, …

Focus is on usability not performance

Focus is on usability not performance

Extendable through ‘Toolboxes’

Extendable through ‘Packages’ and ‘Toolboxes’

Page 41: MATLAB*P: Architecture

What is *p?

• MATLAB*P creates a new MATLAB class called dlayout

• By providing overloaded functions for this new class, parallelism is achieved transparently

• *p is dlayout(1)

Page 42: MATLAB*P: Architecture

Example

• A = randn(1024*p,1024*p);

• E = eig(A);

• e = pp2matlab(E);

• plot(e,’*’);

Page 43: MATLAB*P: Architecture
Page 44: MATLAB*P: Architecture

Example 2

• H = hilb(n) % H = hilb(8000*p)– J = 1:n;– J = J(ones(n,1),:);– I = J’;– E = ones(n,n);– H = E./(I+J-1);

Page 45: MATLAB*P: Architecture

Built-in MATLAB functions

• Hilb is a MATLAB function that get parallelized automatically

• Another example of a MATLAB function that works is cgs

• We do not know how many of them works

Page 46: MATLAB*P: Architecture

Where is the data?

• One often-asked question in parallel computing is: where are the data residing?

• MATLAB*P supports 3 data distributions: – A = randn(m*p,n) – row distribution– B = randn(m,n*p) – column distribution– C = randn(m*p,n*p) – 2D block cyclic

distribution

Page 47: MATLAB*P: Architecture

Flow of a MATLAB*P call

• User types A = randn(256*p) in MATLAB

• The overloaded randn in dlayout class is called

• The overloaded method calls MATLAB*P client with the arguments ‘ppbase_addDense’, 256, 3 (for cyclic dist)

• Client relays the command to server

Page 48: MATLAB*P: Architecture

Flow of a MATLAB*P call (2)

• Server receives the command and pass it on to PackageManager

• PackageManager finds appropriate function in package PPBase, makes the call

• A 2D block cyclic distributed randn matrix is created

• Server passes matrix handle to client, which passes it to MATLAB. Handle is stored in A.

Page 49: MATLAB*P: Architecture

MultiMATLAB mode

• Similar to MultiMATLAB from Cornell

• Starts up a MATLAB process on each node, and runs MATLAB scripts on them in parallel

• Interesting feature is that the data can come from the ‘regular’ mode of MATLAB*P

Page 50: MATLAB*P: Architecture

Example 3

% FFT2 in 4 lines

• A = randn(10000,10000*p);

• B = mm('fft',A);

• C = B.';

• D = mm('fft',C);

• F = D.';

Page 51: MATLAB*P: Architecture

Types of parallel problems

• Collection of small problems– MultiMATLAB mode

• Large problems– ‘Regular mode’ – ScaLAPACK, FFTW, …

• We even allow a mixture of the two modes to give additional flexibility – FFT2

Page 52: MATLAB*P: Architecture

Visualization package

• A term project done by a group of students in a parallel computing class at MIT

• Provides the equivalent of mesh, surf and spy to distributed matrices

• Visualization of very large matrix is now possible in MATLAB

Page 53: MATLAB*P: Architecture
Page 54: MATLAB*P: Architecture

Why parallel MATLAB?

• Why not Maple/Mathematica?– MATLAB*P is not really tied to MATLAB – the server

does not understand a single line of MATLAB

– Interfaces could be written for other software ‘easily’

– We choose MATLAB because it is widely used and easy to use – we want to inherit this property

– Parallelizing maple/mathematica is a much different problem than parallelizing matlab and probably would have a smaller audience

Page 55: MATLAB*P: Architecture

Why is this interesting?

• Overhead is incurred!– Reduction in programming time compensates

the overhead– Makes large numerical experiments easy

Page 56: MATLAB*P: Architecture

More

• Processor maps – ScaLAPACK magic?• Dynamic addition of processors - Singapore• Matlab*p on a grid – UCSB, Parry• Octave in mm mode (mo mode) – UCSB, Parry• Octave as frontend – Harvard• Pipelining - ??• Fault Tolerance - UCSB