linear algebra on gpus

24
Linear Algebra on GPUs Linear Algebra on GPUs Jens Krüger Technische Universität München

Upload: gili

Post on 12-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Linear Algebra on GPUs. Jens Krüger Technische Universität München. Linear algebra?. Why are we interested in Linear Algebra? It is THE machinery to solve PDEs PDEs are at the core of many graphics applications Physics based simulation, Animation, Mesh fairing …. LA on GPUs?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linear Algebra on GPUs

Linear Algebra on GPUsLinear Algebra on GPUs

Jens Krüger Technische Universität München

Page 2: Linear Algebra on GPUs

Linear algebra?

1. Why are we interested in Linear Algebra?

It is THE machinery to solve PDEs1. PDEs are at the core of many graphics applications

Physics based simulation, Animation, Mesh fairing …

Page 3: Linear Algebra on GPUs

LA on GPUs?

2. … and why put LA on GPU?

A perfect couple…GPUs are fast stream processors,and many LA operations are “streamable”

…which goes hand in handThe solution is already on the GPU and ready for display

Page 4: Linear Algebra on GPUs

Getting started …

GPU as workhorse for numerical computations

Computergraphics

applications

Visual simulationVisual computing

Education and Training

Basic linearalgebra operators

General linearalgebra package

Programmable GPUsHighbandwidth

Parallelcomputing

Page 5: Linear Algebra on GPUs

Getting started …

GPU as workhorse for numerical computations

Computergraphics

applications

Visual simulationVisual computing

Education and Training

Basic linearalgebra operators

General linearalgebra package

Programmable GPUsHighbandwidth

Parallelcomputing

Page 6: Linear Algebra on GPUs

• Per-pixel vs. per-vertex operations– 6 gigapixels/second vs. 0.7 gigavertices/second– Efficient texture memory cache– Texture read-write access

– 2D Textures are even better– 2D RGBA textures really rock

– Textures best we can do

Internal affairs

1 N1

N

Vector representation

Page 7: Linear Algebra on GPUs

Representation (cont.)

Dense Matrix representation– Treat a dense matrix as a set of column vectors– Again, store these vectors as 2D textures

Matrix

N

N

i N Vectors

......N

1 i N

N 2D-Textures

... ...1 i N

Page 8: Linear Algebra on GPUs

N

i

Representation (cont.)

Banded Sparse Matrix representation– Treat a banded matrix as a set of diagonal vectors– Combine opposing vectors to save space

N

Matrix 2 Vectors

N

1 2

2 2D-Textures1 2

N-i

Page 9: Linear Algebra on GPUs

Operations 1

• Vector-Vector Operations– Reduced to 2D texture operations– Coded in pixel shaders

• Example: Vector1 + Vector2 Vector3

return tex0 + tex1Pass through

Vector 1 Vector 2

TexUnit 0 TexUnit 1

Vertex Shader Pixel Shader

Vector 3

Render TextureStatic quad

Page 10: Linear Algebra on GPUs

Operations 2 (reduce)

Reduce operation for scalar products

Reduce m x n regionin fragment shader

2 passnd

......

st1 pass

...

original Texture

...

Page 11: Linear Algebra on GPUs

The “single float” on GPUs

Some operations generate single float values e.g. reduce

Read-back to main-mem is slow Keep single floats on the GPU as 1x1 textures

...

Page 12: Linear Algebra on GPUs

Operations (cont.)

Matrix-Vector Operations– Split it up into Vector – Vector operations

Matrix

N

N

i N Vectors

......N

1 i N

N 2D-Textures

... ...1 i N

N

N

Matrixi 2 Vectors

N

1 2

2 2D-Textures1 2

N-i

Page 13: Linear Algebra on GPUs

Operations

In depth example: Vector / Banded-Matrix MultiplicationA xb

=

Page 14: Linear Algebra on GPUs

Example (cont.)

Vector / Banded-Matrix Multiplication

A xb

=

A b

Page 15: Linear Algebra on GPUs

Pass 2Pass 2

Example (cont.)

Compute the result in 2 Passes:

AA

bb

=

b‘b‘

Pass 1Pass 1

xx

Page 16: Linear Algebra on GPUs

Building a Framework

Presented so far:

• Representations on the GPU for– Single float values– Vectors– Matrices

• Dense• Banded• Random sparse (see SIGGRAPH ‘03)

• Operations on these representations– Add, multiply, reduce, …– Upload, download, clear, clone, …

Page 17: Linear Algebra on GPUs

Framework Classes (UML)

+getDevice

clClass

+getSize+clear

clVector

clPackedMatrix

+getSize+matrixVectorOp

clAbstractMatrix+getRenderSurface+getTexture+releaseTexture+releaseRenderSurface

clMemMan

clUnpackedVector+unpack+repack

clPackedVector

+reduce+multiply+copy+add+sub+vectorOp

clFragmentVector

+setRow+getRow

clFragmentMatrix

+setData+setDataSorted

clVertexMatrixclUnpackedMatrix

+setData+getData+add+mul+sub+div+invert

clFloat

Page 18: Linear Algebra on GPUs

Framework Example: CG

Encapsulate into Classes for more complex algorithms– Example use: Conjugate Gradient Method, complete source:

void clCGSolver::solveInit() {Matrix->matrixVectorOp(CL_SUB,X,B,R); // R = A*x-bR->multiply(-1); // R = -RR->clone(P); // P = R R->reduceAdd(R, Rho); // rho = sum(R*R);

}void clCGSolver::solveIteration() {

Matrix->matrixVectorOp(CL_NULL,P,NULL,Q);// Q = Ap;P->reduceAdd(Q,Temp); // temp = sum(P*Q);Rho->div(Temp,Alpha); // alpha = rho/temp;X->addVector(P,X,1,Alpha); // X = X + alpha*PR->subtractVector(Q,R,1,Alpha); // R = R - alpha*QR->reduceAdd(R,NewRho); // newrho = sum(R*R);NewRho->divZ(Rho,Beta); // beta = newrho/rhoR->addVector(P,P,1,Beta); // P = R+beta*P;clFloat *temp; temp=NewRho;NewRho=Rho; Rho=temp; // swap rho and newrho pointers

}void clCGSolver::solve(int maxI) {

solveInit();for (int i = 0;i< maxI;i++) solveIteration();

}int clCGSolver::solve(float rhoTresh, int maxI) {

solveInit(); Rho->clone(NewRho);for (int i = 0;i< maxI && NewRho.getData() > rhoTresh;i++) solveIteration();return i;

}

Page 19: Linear Algebra on GPUs

Example 12D Waves (explicit)

• Finite difference discretization:

• You could write a custom shader for this filter

• Think about this as a matrix-vector operation

t 1 t t t t t t 1i, j i 1, j i, j 1 i 1, j i, j 1 i, j i, jx x x x x (2 4 ) x x

2 2 22

2 2 2

x x xct u v

tx 3tx 4tx 5tx 6tx 7

tx 9

11tx1

2tx1

3tx1

4tx1

5tx1

6tx1

7tx1

8tx1

9tx

tx 1tx 2

tx 7

tx 3tx 4tx 5tx 6tx 7

tx 9

11tx1

2tx1

3tx1

4tx1

5tx1

6tx1

7tx1

8tx1

9tx

tx 1

tx 3tx 4tx 5tx 6tx 7

tx 9

11tx1

2tx1

3tx1

4tx1

5tx1

6tx1

7tx1

8tx1

9tx

tx 1tx 2

tx 7

2

2

2

2

2

2

2

2

2

=-

11tx1

2tx1

3tx1

4tx1

5tx1

6tx1

7tx1

8tx1

9tx

11tx1

2tx1

3tx1

4tx1

5tx1

6tx1

7tx1

8tx1

9tx

11tx1

2tx1

3tx1

4tx1

5tx1

6tx1

7t-x1

8tx1

9tx

Page 20: Linear Algebra on GPUs

2D Waves (cont.)

clMatrix->matrixVectorOp(CL_SUB,clCurrent,clLast,clNext); // next = matrix*current-last;

clLast->copyVector(clCurrent); // save for next iteration

clCurrent->copyVector(clNext); // save for next iteration

cluNext->unpack(clNext); // unpack for rendering

renderHF(cluNext->m_pVectorTexture); // render as heightfield

for (i=sY;i<sX*sY;i++) data[i] = ß; // setup diagonal-sY

matrix->getRow(sX*(sY-1))->setData(data);

for (i=0;i<sX*sY;i++) data[i] = (i%sX) ? ß : 0; // setup diagonal-1

matrix->getRow(sX*sY-1)->setData(data);

for (i=0;i<sX*sY;i++) data[i] = 2-4ß; // setup diagonal

matrix->getRow(sX*sY)->setData(data);

for (i=0;i<sX*sY;i++) data[i] = ((i+1)%sX) ? ß : 0; // setup diagonal+1

matrix->getRow(sX*sY+1)->setData(data);

for (i=0;i<sX*(sY-1);i++) data[i] = ß; // setup diagonal+sY

matrix->getRow(sX*(sY+1))->setData(data);

One Time Matrix Initialization:

Per Frame Iteration

2

2

2

2

2

2

2

2

2

Page 21: Linear Algebra on GPUs

Key Idea– Use different time discretization (e.g. Crank Nicholson)– Results in system of linear equations– Iterative solution using CG

Example 22D Waves (implicit)

=

11

tx

12tx1

3tx1

4tx1

5tx1

6tx1

7tx1

8tx1

9tx

tc3tc4

tc5tc 6tc 7

tc 9

tc1

tc2

tc 7

Page 22: Linear Algebra on GPUs

2D Waves (cont.)

cluRHS->computeRHS(cluLast, cluCurrent); // generate c(i,j,t)

clRHS->repack(cluRHS); // encode into RGBA

iSteps = pCGSolver->solve(iMaxSteps); // solve using CG

cluLast->copyVector(cluCurrent); // save for next iteration

clNext->unpack(cluCurrent); // unpack for rendering

renderHF(cluCurrent->m_pVectorTexture); // render as heightfield

for (i=sY;i<sX*sY;i++) data[i] = -alpha; // setup diagonal-sY

matrix->getRow(sX*(sY-1))->setData(data);

for (i=0;i<sX*sY;i++) data[i] = (i%sX) ? - alpha : 0; // setup diagonal-1

matrix->getRow(sX*sY-1)->setData(data);

for (i=0;i<sX*sY;i++) data[i] = 4*alpha+1 // setup diagonal

matrix->getRow(sX*sY)->setData(data);

for (i=0;i<sX*sY;i++) data[i] = ((i+1)%sX) ? -alpha:0; // setup diagonal+1

matrix->getRow(sX*sY+1)->setData(data);

for (i=0;i<sX*(sY-1);i++) data[i] = -alpha // setup diagonal+sY

matrix->getRow(sX*(sY+1))->setData(data);

One Time Matrix Initialization:

Per Frame Iteration

Page 23: Linear Algebra on GPUs

Demos

Page 24: Linear Algebra on GPUs

The End

Thank you!

Questions?

For more infos, browse to:http://wwwcg.in.tum.de/GPU

http://www.gpgpu.org