lecture 13: dense linear algebra ii - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf ·...
TRANSCRIPT
![Page 1: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/1.jpg)
Lecture 13:Dense Linear Algebra II
David Bindel
8 Mar 2010
![Page 2: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/2.jpg)
Logistics
I Tell me your project idea today (if you haven’t already)!I HW 2 extension to Friday
I Meant to provide more flexibility, not more work!I See comments at start of last time about expectation
I HW 2 common issuesI Segfault in binning probably means particle out of rangeI Particles too close together means either an interaction
skipped or a time step too short
![Page 3: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/3.jpg)
Review: Parallel matmul
I Basic operation: C = C + ABI Computation: 2n3 flopsI Goal: 2n3/p flops per processor, minimal communication
![Page 4: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/4.jpg)
1D layout
BC A
I Block MATLAB notation: A(:, j) means j th block columnI Processor j owns A(:, j), B(:, j), C(:, j)I C(:, j) depends on all of A, but only B(:, j)I How do we communicate pieces of A?
![Page 5: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/5.jpg)
1D layout on bus (no broadcast)
BC A
I Everyone computes local contributions firstI P0 sends A(:,0) to each processor j in turn;
processor j receives, computes A(:,0)B(0, j)I P1 sends A(:,1) to each processor j in turn;
processor j receives, computes A(:,1)B(1, j)I P2 sends A(:,2) to each processor j in turn;
processor j receives, computes A(:,2)B(2, j)
![Page 6: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/6.jpg)
1D layout on bus (no broadcast)
Self A(:,1) A(:,2)A(:,0)
C A B
![Page 7: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/7.jpg)
1D layout on bus (no broadcast)
C(:,myproc) += A(:,myproc)*B(myproc,myproc)for i = 0:p-1for j = 0:p-1if (i == j) continue;if (myproc == i) isend A(:,i) to processor j
if (myproc == j)receive A(:,i) from iC(:,myproc) += A(:,i)*B(i,myproc)
endend
end
Performance model?
![Page 8: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/8.jpg)
1D layout on bus (no broadcast)
No overlapping communications, so in a simple α− β model:I p(p − 1) messagesI Each message involves n2/p dataI Communication cost: p(p − 1)α+ (p − 1)n2β
![Page 9: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/9.jpg)
1D layout on ring
I Every process j can send data to j + 1 simultaneouslyI Pass slices of A around the ring until everyone sees the
whole matrix (p − 1 phases).
![Page 10: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/10.jpg)
1D layout on ring
tmp = A(myproc)C(myproc) += tmp*B(myproc,myproc)for j = 1 to p-1sendrecv tmp to myproc+1 mod p,
from myproc-1 mod pC(myproc) += tmp*B(myproc-j mod p, myproc)
Performance model?
![Page 11: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/11.jpg)
1D layout on ring
In a simple α− β model, at each processor:I p − 1 message sends (and simultaneous receives)I Each message involves n2/p dataI Communication cost: (p − 1)α+ (1− 1/p)n2β
![Page 12: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/12.jpg)
Outer product algorithm
Serial: Recall outer product organization:
for k = 0:s-1C += A(:,k)*B(k,:);
end
Parallel: Assume p = s2 processors, block s × s matrices.For a 2× 2 example:[
C00 C01C10 C11
]=
[A00B00 A00B01A10B00 A10B01
]+
[A01B10 A01B11A11B10 A11B11
]
I Processor for each (i , j) =⇒ parallel work for each k !I Note everyone in row i uses A(i , k) at once,
and everyone in row j uses B(k , j) at once.
![Page 13: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/13.jpg)
Parallel outer product (SUMMA)
for k = 0:s-1for each i in parallelbroadcast A(i,k) to row
for each j in parallelbroadcast A(k,j) to col
On processor (i,j), C(i,j) += A(i,k)*B(k,j);end
If we have tree along each row/column, thenI log(s) messages per broadcastI α+ βn2/s2 per messageI 2 log(s)(αs + βn2/s) total communicationI Compare to 1D ring: (p − 1)α+ (1− 1/p)n2β
Note: Same ideas work with block size b < n/s
![Page 14: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/14.jpg)
Cannon’s algorithm
[C00 C01C10 C11
]=
[A00B00 A01B11A11B10 A10B01
]+
[A01B10 A00B01A10B00 A11B11
]
Idea: Reindex products in block matrix multiply
C(i , j) =
p−1∑k=0
A(i , k)B(k , j)
=
p−1∑k=0
A(i , k + i + j mod p) B(k + i + j mod p, j)
For a fixed k , a given block of A (or B) is needed forcontribution to exactly one C(i , j).
![Page 15: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/15.jpg)
Cannon’s algorithm
% Move A(i,j) to A(i,i+j)for i = 0 to s-1cycle A(i,:) left by i
% Move B(i,j) to B(i+j,j)for j = 0 to s-1cycle B(:,j) up by j
for k = 0 to s-1in parallel;C(i,j) = C(i,j) + A(i,j)*B(i,j);
cycle A(:,i) left by 1cycle B(:,j) up by 1
![Page 16: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/16.jpg)
Cost of Cannon
I Assume 2D torus topologyI Initial cyclic shifts: ≤ s messages each (≤ 2s total)I For each phase: 2 messages each (2s total)I Each message is size n2/s2
I Communication cost: 4s(α+ βn2/s2) = 4(αs + βn2/s)
I This communication cost is optimal!... but SUMMA is simpler, more flexible, almost as good
![Page 17: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/17.jpg)
Speedup and efficiency
Recall
Speedup := tserial/tparallel
Efficiency := Speedup/p
Assuming no overlap of communication and computation,efficiencies are
1D layout(1 + O
(pn
))−1
SUMMA(
1 + O(√
p log pn
))−1
Cannon(
1 + O(√
pn
))−1
![Page 18: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/18.jpg)
Reminder: Why matrix multiply?
LAPACK
BLAS
LAPACK structure
Build fast serial linear algebra (LAPACK) on top of BLAS 3.
![Page 19: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/19.jpg)
Reminder: Why matrix multiply?
ScaLAPACK structure
BLACS
PBLAS
ScaLAPACK
LAPACK
BLAS MPI
ScaLAPACK builds additional layers on same idea.
![Page 20: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/20.jpg)
Reminder: Evolution of LU
On board...
![Page 21: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/21.jpg)
Blocked GEPP
Find pivot
![Page 22: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/22.jpg)
Blocked GEPP
Swap pivot row
![Page 23: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/23.jpg)
Blocked GEPP
Update within block
![Page 24: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/24.jpg)
Blocked GEPP
Delayed update (at end of block)
![Page 25: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/25.jpg)
Big idea
I Delayed update strategy lets us do LU fastI Could have also delayed application of pivots
I Same idea with other one-sided factorizations (QR)I Can get decent multi-core speedup with parallel BLAS!
... assuming n sufficiently large.
There are still some issues left over (block size? pivoting?)...
![Page 26: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/26.jpg)
Explicit parallelization of GE
What to do:I Decompose into work chunksI Assign work to threads in a balanced wayI Orchestrate the communication and synchronizationI Map which processors execute which threads
![Page 27: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/27.jpg)
Possible matrix layouts
1D column blocked: bad load balance
0 0 0 1 1 1 2 2 20 0 0 1 1 1 2 2 20 0 0 1 1 1 2 2 20 0 0 1 1 1 2 2 20 0 0 1 1 1 2 2 20 0 0 1 1 1 2 2 20 0 0 1 1 1 2 2 20 0 0 1 1 1 2 2 20 0 0 1 1 1 2 2 2
![Page 28: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/28.jpg)
Possible matrix layouts
1D column cyclic: hard to use BLAS2/3
0 1 2 0 1 2 0 1 20 1 2 0 1 2 0 1 20 1 2 0 1 2 0 1 20 1 2 0 1 2 0 1 20 1 2 0 1 2 0 1 20 1 2 0 1 2 0 1 20 1 2 0 1 2 0 1 20 1 2 0 1 2 0 1 20 1 2 0 1 2 0 1 2
![Page 29: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/29.jpg)
Possible matrix layouts
1D column block cyclic: block column factorization a bottleneck
0 0 1 1 2 2 0 0 1 10 0 1 1 2 2 0 0 1 10 0 1 1 2 2 0 0 1 10 0 1 1 2 2 0 0 1 10 0 1 1 2 2 0 0 1 10 0 1 1 2 2 0 0 1 10 0 1 1 2 2 0 0 1 10 0 1 1 2 2 0 0 1 10 0 1 1 2 2 0 0 1 10 0 1 1 2 2 0 0 1 1
![Page 30: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/30.jpg)
Possible matrix layouts
Block skewed: indexing gets messy
0 0 0 1 1 1 2 2 20 0 0 1 1 1 2 2 20 0 0 1 1 1 2 2 22 2 2 0 0 0 1 1 12 2 2 0 0 0 1 1 12 2 2 0 0 0 1 1 11 1 1 2 2 2 0 0 01 1 1 2 2 2 0 0 01 1 1 2 2 2 0 0 0
![Page 31: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/31.jpg)
Possible matrix layouts
2D block cyclic:
0 0 1 1 0 0 1 10 0 1 1 0 0 1 12 2 3 3 2 2 3 32 2 3 3 2 2 3 30 0 1 1 0 0 1 10 0 1 1 0 0 1 12 2 3 3 2 2 3 32 2 3 3 2 2 3 3
![Page 32: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/32.jpg)
Possible matrix layouts
I 1D column blocked: bad load balanceI 1D column cyclic: hard to use BLAS2/3I 1D column block cyclic: factoring column is a bottleneckI Block skewed (a la Cannon): just complicatedI 2D row/column block: bad load balanceI 2D row/column block cyclic: win!
![Page 33: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/33.jpg)
Distributed GEPP
Find pivot (column broadcast)
![Page 34: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/34.jpg)
Distributed GEPP
Swap pivot row within block column + broadcast pivot
![Page 35: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/35.jpg)
Distributed GEPP
Update within block column
![Page 36: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/36.jpg)
Distributed GEPP
At end of block, broadcast swap info along rows
![Page 37: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/37.jpg)
Distributed GEPP
Apply all row swaps to other columns
![Page 38: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/38.jpg)
Distributed GEPP
Broadcast block LII right
![Page 39: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/39.jpg)
Distributed GEPP
Update remainder of block row
![Page 40: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/40.jpg)
Distributed GEPP
Broadcast rest of block row down
![Page 41: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/41.jpg)
Distributed GEPP
Broadcast rest of block col right
![Page 42: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/42.jpg)
Distributed GEPP
Update of trailing submatrix
![Page 43: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/43.jpg)
Cost of ScaLAPACK GEPP
Communication costs:I Lower bound: O(n2/
√P) words, O(
√P) messages
I ScaLAPACK:I O(n2 log P/
√P) words sent
I O(n log p) messagesI Problem: reduction to find pivot in each column
I Recent research on stable variants without partial pivoting
![Page 44: Lecture 13: Dense Linear Algebra II - cs.cornell.edubindel/class/cs5220-s10/slides/lec13.pdf · Logistics I Tell me your project idea today (if you haven’t already)! I HW 2 extension](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e176c4c1d54de787f756f5a/html5/thumbnails/44.jpg)
Onward!
Next up: Sparse linear algebra and iterative solvers!