cs 179: gpu programming

Post on 08-Feb-2016

47 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

CS 179: GPU Programming. Lab 7 Recitation : The MPI/CUDA Wave Equation Solver. MPI/CUDA – Wave Equation. Big idea: Divide our data array between n processes!. MPI/CUDA – Wave Equation. Problem if we’re at the boundary of a process!. t+1. t. t -1. x. - PowerPoint PPT Presentation

TRANSCRIPT

CS 179: GPU ProgrammingLab 7 Recitation: The MPI/CUDA Wave Equation Solver

MPI/CUDA – Wave Equation Big idea: Divide our data array between n processes!

MPI/CUDA – Wave Equation Problem if we’re at the boundary of a process!

𝑦 𝑥 ,𝑡+1=2 𝑦 𝑥, 𝑡− 𝑦𝑥 ,𝑡 −1+(𝑐∆ 𝑡∆ 𝑥 )2

(𝑦¿¿ 𝑥+1 ,𝑡−2 𝑦𝑥 ,𝑡+𝑦𝑥−1 ,𝑡 )¿

x

Where do we get ? (It’s outside our process!)

tt-1

t+1

Wave Equation – Simple Solution After every time-step, each process gives its leftmost and

rightmost piece of “current” data to neighbor processes!

Proc0 Proc1 Proc2 Proc3 Proc4

Wave Equation – Simple Solution

Pieces of data to communicate:

Proc0 Proc1 Proc2 Proc3 Proc4

Wave Equation – Simple Solution Can do this with MPI_Irecv, MPI_Isend, MPI_Wait:

Suppose process has rank r: If we’re not the rightmost process:

Send data to process r+1 Receive data from process r+1

If we’re not the leftmost process: Send data to process r-1 Receive data from process r-1

Wait on requests

Wave Equation – Simple Solution Boundary conditions:

Use MPI_Comm_rank and MPI_Comm_size Rank 0 process will set leftmost condition Rank (size-1) process will set rightmost condition

Simple Solution – Problems Communication can be expensive!

Expensive to communicate every timestep to send 1 value!

Better solution: Send some m values every m timesteps!

Possible Implementation Initial setup: (Assume 3 processes)

Proc0 Proc1 Proc2

Possible Implementation Give each array “redundant regions” (Assume communication interval = 3)

Proc0 Proc1 Proc2

Possible Implementation Every (3) timesteps, send some of your data to neighbor

processes!

Possible Implementation Send “current” data (current at time of communication)

Proc0 Proc1 Proc2

Possible Implementation Then send “old” data

Proc0 Proc1 Proc2

Then… Do our calculation as normal, if we’re not at the ends of our array

Our entire array, including redundancies!

𝑦 𝑥 ,𝑡+1=2 𝑦 𝑥, 𝑡− 𝑦𝑥 ,𝑡 −1+(𝑐∆ 𝑡∆ 𝑥 )2

(𝑦¿¿ 𝑥+1 ,𝑡−2 𝑦𝑥 ,𝑡+𝑦𝑥−1 ,𝑡 )¿

What about corruption? Suppose we’ve just copied our data… (assume a non-

boundary process)

. = valid ? = garbage ~ = doesn’t matter

(Recall that there exist only 3 spaces – gray areas are nonexistent in our current time

What about corruption? Calculate new data…

Value unknown!

What about corruption? Time t+1:

Current -> old, new -> current (and space for old is overwritten by new…)

What about corruption? More garbage data!

“Garbage in, garbage out!”

What about corruption? Time t+2…

What about corruption? Even more garbage!

What about corruption? Time t+3…

Core data region - corruption imminent!?

What about corruption? Saved!

Data exchange occurs after communication interval has passed!

“It’s okay to play with garbage… just don’t get sick”

Boundary Conditions Applied only at the leftmost and rightmost process!

Boundary corruption? Examine left-most process:

We never copy to it, so left redundant region is garbage!

(B = boundary condition set)

Boundary corruption? Calculation brings garbage into non-redundant region!

Boundary corruption? …but boundary condition is set at every interval!

Other details To run programs with MPI, use the “mpirun” command, e.g.mpirun -np (number of processes) (your program and arguments)

CMS machines: Add this to your .bashrc file:alias mpirun=/cs/courses/cs179/openmpi-1.6.4/bin/mpirun

Common bugs (and likely causes) Lock-up (it seems like nothing’s happening):

Often an MPI issue – locks up on MPI_Wait because some request wasn’t fulfilled

Check that all sends have corresponding receives

Your wave looks weird: Likely cause 1: Garbage data is being passed between

processes Likely cause 2: Redundant regions aren’t being refreshed and/or

are contaminating non-redundant regions

Your wave is flat-zero: Left boundary condition isn’t being initialized and/or isn’t

propagating Same reasons as previous

Common bugs (and likely causes)

Common bugs (and likely causes) General debugging tips:

Run at MPI with process number = 1 or 2 Set kernel to write constant value

top related