more loop unrolling and vectorization - computer …jad5ju/cs4501/more loop unrolling.pdf · l1:...
TRANSCRIPT
![Page 1: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/1.jpg)
More Loop Unrolling and Vectorization
![Page 2: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/2.jpg)
Loop Unrolling Review
li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
L1: ble r1 r0 L2add r2 <- r2 r0add r0 <- r0 r3jmp L1
L2: mov r1 <- r2syscall IO.out_int
![Page 3: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/3.jpg)
Loop Unrolling ReviewGoal: unroll this loop, without duplicating ble.
Unrolled loop runs for a multiple of the unrolling factor.◦r0, r1, and number of
iterations determine if we have extra iterations
li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
L1: ble r1 r0 L2add r2 <- r2 r0add r0 <- r0 r3jmp L1
L2: mov r1 <- r2syscall IO.out_int
![Page 4: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/4.jpg)
Data-Flow Analysis for Affine ExpressionsSimilar to constant propagation.
Direction: Forward
Values: (for each variable)◦ Unknown (⊤)
◦ Affine expression(𝑐0 + 𝑐1𝑥1 + 𝑐2𝑥2 +⋯)
◦ Not affine expression (⊥)
Meet operator:◦ Let 𝑣[𝑥] be the data-flow
value for variable 𝑥.
◦ Usual rules for ⊤.
◦ If 𝑣1 𝑥 = 𝑣2 𝑥 :◦ 𝑣1 ∧ 𝑣2 𝑥 = 𝑣1 𝑥
◦ Otherwise,◦ 𝑣1 ∧ 𝑣2 𝑥 = ⊥
![Page 5: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/5.jpg)
Data-Flow Analysis for Affine ExpressionsStatement Transfer Function
la 𝑥 <- 𝑐 𝑓𝑠 𝑣 𝑥 = 𝑐
li 𝑥 <- 𝑐 𝑓𝑠 𝑣 𝑥 = 𝑐
ld 𝑥 <- 𝑦[𝑐] 𝑓𝑠 𝑣 𝑥 = 𝑣 𝑦 𝑐
mov 𝑥 <- 𝑦 𝑓𝑠 𝑣 𝑥 = 𝑣[𝑦]
add 𝑥 <- 𝑦 𝑧 𝑓𝑠 𝑣 𝑥 = 𝑣 𝑦 + 𝑣[𝑧]
mul 𝑥 <- 𝑦 𝑧 𝑓𝑠 𝑣 𝑥 = 𝑣[𝑦] ∙ 𝑣[𝑧] (if 𝑣 𝑦 = 𝑐 or 𝑣 𝑧 = 𝑐)
div 𝑥 <- 𝑦 𝑧 𝑓𝑠 𝑣 𝑥 = 𝑣 𝑦 /𝑣[𝑧] (if 𝑣 𝑧 = 𝑐 and 𝑣[𝑧] ≠ 0)
![Page 6: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/6.jpg)
Data-Flow Analysis for Affine ExpressionsStatement Transfer Function
la 𝑥 <- 𝑐 𝑓𝑠 𝑣 𝑥 = 𝑐
li 𝑥 <- 𝑐 𝑓𝑠 𝑣 𝑥 = 𝑐
ld 𝑥 <- 𝑦[𝑐] 𝑓𝑠 𝑣 𝑥 = 𝑣 𝑦 𝑐
mov 𝑥 <- 𝑦 𝑓𝑠 𝑣 𝑥 = 𝑣[𝑦]
add 𝑥 <- 𝑦 𝑧 𝑓𝑠 𝑣 𝑥 = 𝑣 𝑦 + 𝑣[𝑧]
mul 𝑥 <- 𝑦 𝑧 𝑓𝑠 𝑣 𝑥 = 𝑣[𝑦] ∙ 𝑣[𝑧] (if 𝑣 𝑦 = 𝑐 or 𝑣 𝑧 = 𝑐)
div 𝑥 <- 𝑦 𝑧 𝑓𝑠 𝑣 𝑥 = 𝑣 𝑦 /𝑣[𝑧] (if 𝑣 𝑧 = 𝑐 and 𝑣[𝑧] ≠ 0)
𝑣 𝑦 =⊥
𝑜𝑟𝑣 𝑧 =⊥
⇒ 𝑓𝑠(𝑣) 𝑥 =⊥
![Page 7: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/7.jpg)
Loop Example
li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
L1: ble r1 r0 L2add r2 <- r2 r0add r0 <- r0 r3jmp L1
L2: mov r1 <- r2syscall IO.out_int
![Page 8: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/8.jpg)
Loop Example (CFG)li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
L1: ble r1 r0 L2
L2: mov r1 <- r2syscall IO.out_int
add r2 <- r2 r0add r0 <- r0 r3jmp L1
𝑟1 ≰ 𝑟0
𝑟1 ≤ 𝑟0
![Page 9: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/9.jpg)
Loop Example (DFS Tree)li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
L1: ble r1 r0 L2
L2: mov r1 <- r2syscall IO.out_int
add r2 <- r2 r0add r0 <- r0 r3jmp L1
1
2
4 3𝑟1 ≰ 𝑟0 𝑟1 ≤ 𝑟0
![Page 10: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/10.jpg)
Loop Example (Loop Detection)li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
L1: ble r1 r0 L2
L2: mov r1 <- r2syscall IO.out_int
add r2 <- r2 r0add r0 <- r0 r3jmp L1
1
2
4 3
Back Edge: 4 → 2
𝑙𝑜𝑜𝑝 = {2, 4}
𝑟1 ≰ 𝑟0 𝑟1 ≤ 𝑟0
![Page 11: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/11.jpg)
Loop Example (Loop Detection)li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
L1: ble r1 r0 L2
L2: mov r1 <- r2syscall IO.out_int
add r2 <- r2 r0add r0 <- r0 r3jmp L1
1
2
4 3
Back Edge: 4 → 2
𝑙𝑜𝑜𝑝 = {2, 4}
𝑟1 ≰ 𝑟0 𝑟1 ≤ 𝑟0
![Page 12: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/12.jpg)
Loop Example (Data-Flow Analysis)
Var 𝒇𝑩𝟐(𝒗) 𝒇
𝑩𝟒(𝒗)
r0
r1
r2
r3
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 13: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/13.jpg)
Loop Example (Data-Flow Analysis)
Var 𝒇𝑩𝟐(𝒗) 𝒇
𝑩𝟒(𝒗)
r0 𝑣[𝑟0]
r1 𝑣[𝑟1]
r2 𝑣[𝑟2]
r3 𝑣[𝑟3]
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 14: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/14.jpg)
Loop Example (Data-Flow Analysis)
Var 𝒇𝑩𝟐(𝒗) 𝒇
𝑩𝟒(𝒗)
r0 𝑣[𝑟0] 𝑣 𝑟0 + 𝑣[𝑟3]
r1 𝑣[𝑟1] 𝑣[𝑟1]
r2 𝑣[𝑟2] 𝑣 𝑟2 + 𝑣[𝑟0]
r3 𝑣[𝑟3] 𝑣[𝑟3]
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 15: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/15.jpg)
Loop Example (Data-Flow Analysis)
Var IN[B2] OUT[B4]
r0
r1
r2
r3
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
1
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 16: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/16.jpg)
Loop Example (Data-Flow Analysis)
Var IN[B2] OUT[B4]
r0 0 ∧ ⊤ = 0 1
r1 ⊥ ∧ ⊤ = ⊥ ⊥
r2 0 ∧ ⊤ = 0 0
r3 1 ∧ ⊤ = 1 1
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
1
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 17: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/17.jpg)
Loop Example (Data-Flow Analysis)
Var IN[B2] OUT[B4]
r0 0 ∧ 1 = ⊥ ⊥
r1 ⊥ ∧ ⊥ = ⊥ ⊥
r2 0 ∧ 0 = 0 ⊥
r3 1 ∧ 1 = 1 1
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
1
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 18: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/18.jpg)
Loop Example (Data-Flow Analysis)
Var IN[B2] OUT[B4]
r0 0 ∧ ⊥ = ⊥ ⊥
r1 ⊥ ∧ ⊥ = ⊥ ⊥
r2 0 ∧ ⊥ = ⊥ ⊥
r3 1 ∧ 1 = 1 1
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
1Total
failure!
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 19: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/19.jpg)
Iterated Transfer FunctionsTrack data-flow values as functions of number of iterations.
◦ After 1 iteration:𝑓𝐵41 𝑣0 𝑟0 = 𝑣0 𝑟0 + 𝑣0 𝑟3 = 𝑣0 𝑟0 + 1
◦ After 2 iterations:𝑓𝐵42 𝑣0 𝑟0 = 𝑣0 𝑟0 + 1 + 1 = 𝑣0 𝑟0 + 2
◦ After 𝑖 iterations:𝑓𝐵4𝑖 𝑣0 𝑟0 = 𝑣0 𝑟0 + 𝑖
![Page 20: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/20.jpg)
Handling Iteration
Symbolic constants:◦ If 𝑓 𝑣 𝑥 = 𝑣 𝑥 , 𝑓𝑖 𝑣0 [𝑥] = 𝑣0[𝑥]
Basic induction variables:◦ If 𝑓 𝑣 𝑥 = 𝑐 + 𝑣[𝑥], 𝑓𝑖 𝑣0 𝑥 = 𝑐𝑖 + 𝑣0[𝑥]
Induction variables (if 𝑦1… are basic induction variables or symbolic constants and 𝑥 ≢ 𝑦𝑖):◦ If 𝑓 𝑣 𝑥 = 𝑐0 + 𝑐1𝑣[𝑦1] + ⋯, 𝑓𝑖 𝑣0 𝑥 = 𝑐0 + 𝑐1𝑓
𝑖 𝑣0 𝑦1 +⋯
![Page 21: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/21.jpg)
Loop Example (Data-Flow Analysis)
Var 𝒇𝑩𝟒(𝒗) 𝒇𝑩𝟒
𝒊 (𝒗𝟎)
r0 𝑣 𝑟0 + 𝑣[𝑟3]
r1 𝑣[𝑟1]
r2 𝑣 𝑟2 + 𝑣[𝑟0]
r3 𝑣[𝑟3]
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 22: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/22.jpg)
Loop Example (Data-Flow Analysis)
Var 𝒇𝑩𝟒(𝒗) 𝒇𝑩𝟒
𝒊 (𝒗𝟎)
r0 𝑣 𝑟0 + 𝑣[𝑟3]
r1 𝑣[𝑟1] 𝑣0[𝑟1]
r2 𝑣 𝑟2 + 𝑣[𝑟0]
r3 𝑣[𝑟3] 𝑣0[𝑟3]
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 23: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/23.jpg)
Loop Example (Data-Flow Analysis)
Var 𝒇𝑩𝟒(𝒗) 𝒇𝑩𝟒
𝒊 (𝒗𝟎)
r0 𝑣 𝑟0 + 𝑣[𝑟3] 𝑣0 𝑟0 + 𝑣0 𝑟3 𝑖
r1 𝑣[𝑟1] 𝑣0[𝑟1]
r2 𝑣 𝑟2 + 𝑣[𝑟0]
r3 𝑣[𝑟3] 𝑣0[𝑟3]
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 24: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/24.jpg)
Loop Example (Data-Flow Analysis)
Var 𝒇𝑩𝟒(𝒗) 𝒇𝑩𝟒
𝒊 (𝒗𝟎)
r0 𝑣 𝑟0 + 𝑣[𝑟3] 𝑣0 𝑟0 + 𝑣0 𝑟3 𝑖
r1 𝑣[𝑟1] 𝑣0[𝑟1]
r2 𝑣 𝑟2 + 𝑣[𝑟0] ⊥
r3 𝑣[𝑟3] 𝑣0[𝑟3]
L1: ble r1 r0 L2
add r2 <- r2 r0add r0 <- r0 r3jmp L1
2
4
𝑟1 ≰ 𝑟0𝑟1 ≤ 𝑟0
![Page 25: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/25.jpg)
Finding the Number of IterationsUse 𝑓𝑖 to compute value on back edges.
We want to find 𝑖𝑚𝑎𝑥 such that:
𝑓𝑖(𝑣0) 𝑟1 ≰ 𝑓𝑖(𝑣0)[𝑟0]
![Page 26: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/26.jpg)
Finding the Number of IterationsUse 𝑓𝑖 to compute value on back edges.
We want to find 𝑖𝑚𝑎𝑥 such that:
𝑓𝑖(𝑣0) 𝑟1 ≰ 𝑓𝑖(𝑣0)[𝑟0]
𝑣0 𝑟1 ≰ 𝑣0 𝑟0 + 𝑣0[𝑟3]𝑖𝑚𝑎𝑥
𝑣0 𝑟1 −𝑣0 𝑟0
𝑣0 𝑟3> 𝑖𝑚𝑎𝑥
𝑣0 𝑟1 = 𝑖𝑚𝑎𝑥 + 1
![Page 27: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/27.jpg)
Loop Unrolling
Now we know initial value of r1 sets number of iterations.◦ Check it against the loop
unrolling factor to handle extra iterations.
li r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1
L1: ble r1 r0 L2add r2 <- r2 r0add r0 <- r0 r3jmp L1
L2: mov r1 <- r2syscall IO.out_int
![Page 28: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/28.jpg)
Loop Unrollingli r0 <- 0syscall IO.in_intli r2 <- 0li r3 <- 1li r4 <- 3; factordiv r5 <- r1 r4mul r5 <- r5 r4sub r5 <- r1 r5bz r5 L1add r2 <- r2 r0add r0 <- r0 r3beq r5 r0 L1add r2 <- r2 r0add r0 <- r0 r3
L1: beq r1 r0 L2
Unrolling factor
r5 <- r1 mod r4
Handle extra iterations.
![Page 29: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/29.jpg)
Auto-Vectorization
![Page 30: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/30.jpg)
Automatic VectorizationSimilar to loop unrolling:◦ Consecutive iterations with
independent arithmetic.
◦ Perform arithmetic for several iterations together in vector.
◦ Usually implemented over arrays.
let x : List <- getlist() in
while not isvoid(x) loop {
x.incrBy(2);
x <- x.next();
} pool
![Page 31: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/31.jpg)
Automatic VectorizationSimilar to loop unrolling:◦ Consecutive iterations with
independent arithmetic.
◦ Perform arithmetic for several iterations together in vector.
◦ Usually implemented over arrays.
let x : List <- getlist() in
while not isvoid(x) loop {
x.incrBy(2);
x <- x.next();
} pool
Inline these.
![Page 32: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/32.jpg)
Automatic VectorizationSimilar to loop unrolling:◦ Consecutive iterations with
independent arithmetic.
◦ Perform arithmetic for several iterations together in vector.
◦ Usually implemented over arrays.
let x : List <- getlist() in
while not isvoid(x) loop {
x.incrBy(2);
x <- x.next();
} pool
Unroll this.
![Page 33: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/33.jpg)
Automatic Vectorization (Cool ASM)li t1 <- 2
L1: bz r0 L2
ld t2 <- r0[3] ; x.incrby(2)
add t3 <- t2 t1
st r0[3] <- t3
ld t4 <- r0[4] ; x<-x.next()
ld t5 <- t4[3] ; x.incrby(2)
add t6 <- t5 t1
st t4[3] <- t6
ld r0 <- t4[4] ; x<-x.next()
jmp L1
![Page 34: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/34.jpg)
Automatic Vectorization (Cool ASM)li t1 <- 2
L1: bz r0 L2
ld t2 <- r0[3] ; x.incrby(2)
add t3 <- t2 t1
st r0[3] <- t3
ld t4 <- r0[4] ; x<-x.next()
ld t5 <- t4[3] ; x.incrby(2)
add t6 <- t5 t1
st t4[3] <- t6
ld r0 <- t4[4] ; x<-x.next()
jmp L1
li t1 <- 2
L1: bz r0 L2
ld t4 <- r0[4]
ld t2 <- r0[3]
ld t5 <- t4[3]
add t3 <- t2 t1
add t6 <- t5 t1
st r0[3] <- t3
st t4[3] <- t6
ld r0 <- t4[4]
jmp L1
Code reordering
![Page 35: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/35.jpg)
Automatic Vectorization (Cool ASM)li t1 <- 2
L1: bz r0 L2
ld t4 <- r0[4]
ld t2 <- r0[3]
ld t5 <- t4[3]
add t3 <- t2 t1
add t6 <- t5 t1
st r0[3] <- t3
st t4[3] <- t6
ld r0 <- t4[4]
jmp L1
1. Group arithmetic together.
2. Pack temporaries in vector registers.
3. Replace add with vector-add.
4. Unpack vector result.
![Page 36: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/36.jpg)
Automatic Vectorization (Cool ASM)li vr10 <- 2
li vr11 <- 2
L1: bz r0 L2
ld t4 <- r0[4]
ld vr00 <- r0[3]
ld vr01 <- t4[3]
vadd vr0 <- vr0 vr1
st r0[3] <- vr00
st t4[3] <- vr01
ld r0 <- t4[4]
jmp L1
1. Group arithmetic together.
2. Pack temporaries in vector registers.
3. Replace add with vector-add.
4. Unpack vector result.
![Page 37: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/37.jpg)
A SimpleInterprocedural Analysis
![Page 38: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/38.jpg)
A Simple Interprocedural AnalysisIdea: Treat method calls as control flow.
If method instance is known:◦ Add CFG edge from call to top of method body.
◦ Add CFG edge from end of method to statement-after-call.
◦ Similar to inlining, but without the code bloat.
Extension: “clone” method’s CFG nodes for each invocation.
This analysis has difficulty with recursion.
![Page 39: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/39.jpg)
Interprocedural Example
f() : Int {{t1 <- g(0);t2 <- g(1);t1 + t2;
}}
g(x : Int) : Int {x + 1
}
![Page 40: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/40.jpg)
Interprocedural Examplef() : Int {{
t1 <- g(0);
g(x : Int) : Int {x + 1
}t2 <- g(1);
t1 + t2;}}
![Page 41: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/41.jpg)
Interprocedural Examplef() : Int {{
t1 <- g(0);
g(x : Int) : Int {x + 1
}t2 <- g(1);
t1 + t2;}}
IN[g][x] = 0 ∧ 1 =⊥
![Page 42: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/42.jpg)
Interprocedural Examplef() : Int {{
t1 <- g(0); g(x : Int) : Int {x + 1
}
t2 <- g(1);
t1 + t2;}}
g(x : Int) : Int {x + 1
}
![Page 43: More Loop Unrolling and Vectorization - Computer …jad5ju/cs4501/More Loop Unrolling.pdf · L1: ble r1 r0 L2 add r2](https://reader031.vdocuments.us/reader031/viewer/2022022617/5ba4bf5f09d3f235188bed45/html5/thumbnails/43.jpg)
Interprocedural Examplef() : Int {{
t1 <- g(0); g(x : Int) : Int {x + 1
}
t2 <- g(1);
t1 + t2;}}
g(x : Int) : Int {x + 1
}
IN[g][x] = 0
IN[g][x] = 1