cs420 lecture six loops. time analysis of loops often easy: eg bubble sort for i in 1..(n-1) for j...
TRANSCRIPT
![Page 1: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/1.jpg)
CS420 lecture sixLoops
![Page 2: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/2.jpg)
Time Analysis of loops
Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop body takes constant time2. loop body is executed times
€
(n − i) =n(n −1)
2i=1
n−1
∑ =O(n2)
![Page 3: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/3.jpg)
Convex hull
Given a set of points in 2D ((x,y) coordinates),find the smallest convex polygon surroundingthem all.
![Page 4: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/4.jpg)
![Page 5: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/5.jpg)
Convex hull
Given a set of points in 2D ((x,y) coordinates),find the smallest polygon surrounding them all.
The problem reduces to finding line segments connecting points of the set.
![Page 6: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/6.jpg)
Convex hull
![Page 7: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/7.jpg)
Convex hull: first attempt
Let L be a line segment connecting two points inthe set. For L to be in the convex hull it issufficient that all other points are on the same side of L’s extension to a full line.
![Page 8: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/8.jpg)
Convex hull: first attempt
Let L be a line segment connecting two points inthe set. For L to be in the convex hull it issufficient that all other points are on the same side of L’s extension to a full line.
How do you find out all other points are on thesame side?
![Page 9: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/9.jpg)
Convex hull: first attempt
Let L be a line segment connecting two points in the set. For L tobe in the convex hull it is sufficient that all other points are on the same side of L’s extension to a full line.
for i = 1 to n for j= i+1 to n for k = 1 to n if (k!=i&&k!=j) check(pi,pj,pk)
![Page 10: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/10.jpg)
Convex hull: first attempt
for i = 1 to n for j= i+1 to n for k = 1 to n if (k!=i&&k!=j) check(pi,pj,pk)
check is O(1) so this algorithm is O(n3)
![Page 11: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/11.jpg)
the question that drives us.....
![Page 12: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/12.jpg)
is there a better algorithm• Find lowest point P1• Sort remaining points by angle they
form with P1 and the horizontal, resulting in a sequence P2…Pn
• Start with P1-P2 in current hull• for i from 3 to n • add Pi in current hull for j from i-1 downto 3 eliminate Pj if P1 and Pi are on different side of line Pj-P(j-1); if Pj stays break
![Page 13: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/13.jpg)
is there a better algorithm• Find lowest point P1• Sort remaining points by angle they
form with P1 and the horizontal, resulting in a sequence P2…Pn
• Start with P1-P2 in current hull• for i from 3 to n • add Pi in current hull for j from i-1 downto 3 eliminate Pj if P1 and Pi are on different side of line Pj-P(j-1); if Pj stays break1
3
2
4
![Page 14: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/14.jpg)
Complexity?
• find lowest: O(n)• sort O(nlgn)• nested add/eliminate loop outer: i from 3 to n inner: j from i-1 downto 3
O(?)
![Page 15: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/15.jpg)
nested add/eliminate loop
• O(N) !! why?• n-2 points considered in i loop j loop either eliminates a point, ie it will not
be checked again, or stops. The total number of points considered in all j loop iterations is therefore O(n)
• Complex hull algorithm complexity O(n lg n)
![Page 16: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/16.jpg)
is there a better algorithm?
• no
![Page 17: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/17.jpg)
is there a better algorithm?
• no, argument is harder (lower bound arguments usually are)
• it can be shown that sorting can be reduced to convex hull (reduced: translated such that when the convex hull problem is solved the original sorting problem is solved) and we have shown that sorting is Ω(n lg n)
![Page 18: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/18.jpg)
sort({3, 1, 2}) convex hull({(3,9), (2,4), (1,1)})
3,9
2,4
1,1
reduction:x x,x2
![Page 19: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/19.jpg)
Sub-O Optimizations
• Suppose you have written an asymptotically optimal program, and still want to speed it up.
• Using a profiler identify which parts of your code are the hotspots of your program.
• 10/90 rule of thumb: 90% of the time is spent in 10% of the code: hotspots – Usually some of the innermost loops– Only improve the hotspots. Leave the rest clear
and simple.
![Page 20: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/20.jpg)
Data reorganization
• Create sentinel (value at boundary) to simplify loop control.
found = false; i=0; while (i<n and not found) if (x[i]==T) found = true; else i++;
![Page 21: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/21.jpg)
Data reorganization
• Create sentinel to simplify loop control. found = false; i=0; while (i<n and not found) if (x[i]==T) found = true; else i++; • Sentinel: value at boundary x[n]=T; i=0; while (x[i]!=T)i++; found = (i<n);
![Page 22: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/22.jpg)
Loop unrolling
• Loop unrolling is textually repeating the loop body so that the loop control is executed fewer times– Eg, a median filter operator on an image executes
a 3x3 inner loop for each resulting pixel; this can be fully unrolled
– some compilers (eg CUDA) allow unroll k pragmas– in a linked list, if the last element points at itself,
visiting the elements can be partially unrolled
![Page 23: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/23.jpg)
Loop peeling
• When the body of a loop tests whether it is on a boundary, and has a special case for that boundary, it is often advantageous to have separate code for the boundary avoiding the conditional in the loop body.
• Eg, median filter
![Page 24: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/24.jpg)
Loop unrolling and trivial assignments
fibonacci(n) a=b=c =1; // what happens if the loop gets unrolled once?
for i = 3 to n { c=a+b, a=b; b=c } return c;
![Page 25: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/25.jpg)
Loop unrolling and trivial assignments
fibonacci(n) a=b=c =1; for i = 3 to n { c=a+b, a=b; b=c } return c;
fibonacci(n) a=b=1; for i = 1 to (n/2 -1) {a=a+b; b=a+b} if odd(n) b = a+b; return b;
![Page 26: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/26.jpg)
Memory hierarchy (cache) issues
• Processor are an order of magnitude faster than memories– both have been speeding up exponentially for ~30
years: but with different bases, so their ratio has been growing exponentially as well
– caches keep recently used (temporal locality) and fetch in cache lines (spatial locality)
![Page 27: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/27.jpg)
cache issues
• memory wall• getting over it: cache• cache line• cache replacement policy: LRU• cache and memory layout of 1D representation
of 2D arrays in C– row access– col access
![Page 28: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/28.jpg)
Data or loop reordering for improve cache performance
Matrix multiply: for i = 1 to n for j= 1 to n C[i,j]=0 for k = 1 to n C[i,j]+=A[i,k]*B[k,j]
![Page 29: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/29.jpg)
Data or loop reordering for improve cache performance
Matrix multiply: for i = 1 to n for j= 1 to n C[i,j]=0 for k = 1 to n C[i,j]+=A[i,k]*B[k,j]
B is accessed in column order. If the arrays are (as in C) stored in row major order, this causes cache misses and unnecessary reads!!
![Page 30: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/30.jpg)
Data or loop reordering for improve cache performance
Matrix multiply: for i = 1 to n for j= 1 to n C[i,j]=0 for k = 1 to n C[i,j]+=A[i,k]*B[k,j]
While one row of A is read, all of B is read
If the cache cannot keep all of B and uses theLeast Recently Used replace policy, all reads of B will cause a cache miss
![Page 31: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/31.jpg)
Tiling for improved cache behavior
Instead of reading a whole row of A and doing n whole row A column B inner products we canread a block of A and compute smaller inner products with sub columns of B.
(Remember blocked matrix multiply in Strassen)
These partial products are then added up.
![Page 32: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/32.jpg)
Conventional matrix multiply
![Page 33: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/33.jpg)
Conventional matrix multiply
![Page 34: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/34.jpg)
Conventional matrix multiply
![Page 35: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/35.jpg)
Conventional matrix multiply
![Page 36: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/36.jpg)
Conventional matrix multiply
![Page 37: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/37.jpg)
Conventional matrix multiply
![Page 38: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/38.jpg)
Conventional matrix multiply
etc. .....
![Page 39: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/39.jpg)
Conventional matrix multiply
All elements of B are used once, while all of row A[i] are used n times. A[i] may fit in the cache, B will probably not!
![Page 40: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/40.jpg)
Tiled matrix multiply
![Page 41: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/41.jpg)
Tiled matrix multiply
![Page 42: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/42.jpg)
Tiled matrix multiply
![Page 43: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/43.jpg)
Tiled matrix multiply
![Page 44: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/44.jpg)
Tiled matrix multiply
![Page 45: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/45.jpg)
Tiled matrix multiply
![Page 46: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/46.jpg)
Tiled matrix multiply
![Page 47: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/47.jpg)
Tiled matrix multiply
![Page 48: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/48.jpg)
Tiled matrix multiply
![Page 49: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/49.jpg)
Reuse of tile of B
• A k x k tile of A (which can fit in the cache) block multiplies with a k x k tile of B (which can fit in the cache) and thus reuses the B tile k times, potentially providing better cache use
• We can parameterize our program with k and experiment
• Data and loop reordering matrix multiply: assignment 2
![Page 50: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/50.jpg)
Experiments you can do
• Transpose B for better cache line behavior• Tile the loop as in the example• In array access A[i*N+j] avoid the multiply by
doing pointer increments and dereferences• You will have a number of versions of your
code. Make a 2D table of results. Then make observations about your results. In a follow up discussion, exchange your experiences
![Page 51: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/51.jpg)
Tiling
• Loops become nested loops– outer loop visits tile origins– inner loops visit the tile points
![Page 52: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/52.jpg)
Can every loop be tiled???
Tile this for i=1 to n for j=1 to n if (i==1 && j==1) A[i,j]=1 elif j==1 A[i,j]=A[i-1,n] else A[i,j]=A[i,j-1]
![Page 53: CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop](https://reader031.vdocuments.us/reader031/viewer/2022020800/56649cf95503460f949c9ffe/html5/thumbnails/53.jpg)
Tiling cont'• Tiling is loop reordering. The reordered loop must
obey the data dependences in the original loop. Let iteration i',j',k' occur before iteration i,j,k – true dependence:
• i,j,k uses a value that i',j',k produced
– anti dependence:• i,j,k redefines a value that i',j',k' used
– output dependence:• i,j,k redefines a value that i',j',k' defined
in all these cases the reordered loop must obey the original ordering of the two iterations.