Kevin WalshCS 3410, Spring 2010
Computer ScienceCornell University
RISC Pipeline
See: P&H Chapter 4.6
2
A Processor
alu
PC
imm
memory
memory
din dout
addr
target
offset cmpcontrol
=?
new pc
registerfile
inst
extend
+4 +4
3
Write-BackMemory
InstructionFetch Execute
p
InstructionDecode
registerfile
control
A Processor
alu
imm
memory
din dout
addr
inst
PC
memory
computejump/branch
targets
new pc
+4
extend
4
Basic Pipeline
Five stage “RISC” load-store architecture1. Instruction fetch (IF)
– get instruction from memory, increment PC2. Instruction Decode (ID)
– translate opcode into control signals and read registers3. Execute (EX)
– perform ALU operation, compute jump/branch targets4. Memory (MEM)
– access memory if needed5. Writeback (WB)
– update register file
Slides thanks to Sally McKee & Kavita Bala
5
Pipelined Implementation
Break instructions across multiple clock cycles (five, in this case)
Design a separate stage for the execution performed during each clock cycle
Add pipeline registers to isolate signals between different stages
6
Write-BackMemory
InstructionFetch Execute
InstructionDecode
extend
registerfile
control
Pipelined Processor
alu
memory
din dout
addrPC
memory
newpc
inst
IF/ID ID/EX EX/MEM MEM/WB
imm
BA
ctrl
ctrl
ctrl
BD D
M
computejump/branch
targets
+4
7
IF
Stage 1: Instruction Fetch
Fetch a new instruction every cycle• Current PC is index to instruction memory• Increment the PC at end of cycle (assume no branches for now)
Write values of interest to pipeline register (IF/ID)• Instruction bits (for later decoding)• PC+4 (for later computing branch targets)
8
IF
PC
instructionmemory
newpc
inst
addr mc
00 = read word
1
IF/ID
WE 1
Rest
of p
ipel
ine
+4
PC+4
pcsel
pcregpcrel
pcabs
9
ID
Stage 2: Instruction Decode
On every cycle:• Read IF/ID pipeline register to get instruction bits• Decode instruction, generate control signals• Read from register file
Write values of interest to pipeline register (ID/EX)• Control information, Rd index, immediates, offsets, …• Contents of Ra, Rb• PC+4 (for computing branch targets later)
10
ID
ctrl
ID/EX
Rest
of p
ipel
ine
PC+4
inst
IF/ID
PC+4
Stag
e 1:
Inst
ructi
on F
etch
registerfile
WERd
Ra Rb
DB
A
BA
extend imm
decode
result
dest
11
EX
Stage 3: Execute
On every cycle:• Read ID/EX pipeline register to get values and control bits• Perform ALU operation• Compute targets (PC+4+offset, etc.) in case this is a branch• Decide if jump/branch should be taken
Write values of interest to pipeline register (EX/MEM)• Control information, Rd index, …• Result of ALU operation• Value in case this is a memory store instruction
12
Stag
e 2:
Inst
ructi
on D
ecod
e
pcrel
pcabs
EX
ctrl
EX/MEM
Rest
of p
ipel
ine
BD
ctrl
ID/EX
PC+4
BA
alu
+
||
branch?
imm
pcsel
pcreg
13
MEM
Stage 4: Memory
On every cycle:• Read EX/MEM pipeline register to get values and control bits• Perform memory load/store if needed
– address is ALU result
Write values of interest to pipeline register (MEM/WB)• Control information, Rd index, …• Result of memory operation• Pass result of ALU operation
14
MEM
ctrl
MEM/WB
Rest
of p
ipel
ine
Stag
e 3:
Exe
cute
MD
ctrl
EX/MEM
BD
memory
din dout
addr
mc
15
WB
Stage 5: Write-back
On every cycle:• Read MEM/WB pipeline register to get values and control bits• Select value and write to register file
16
WB
Stag
e 4:
Mem
ory
ctrl
MEM/WB
MD
result
dest
17IF/ID
+4
ID/EX EX/MEM MEM/WB
mem
din dout
addrinst
PC+4
OP
BA
Rd
BD
MD
PC+4
imm
OP
Rd
OP
Rd
PC
instmem
Rd
Ra Rb
DB
A
18
Example
add r3, r1, r2; nand r6, r4, r5; lw r4, 20(r2); add r5, r2, r5; sw r7, 12(r3);
19
0:add1:nand2:lw3:add4:sw
r0r1r2r3r4r5r6r7
0369
12187
4122
IF/ID
+4
ID/EX EX/MEM MEM/WB
mem
din dout
addrinst
PC+4
OP
BA
Rd
BD
MD
PC+4
imm
OP
Rd
OP
Rd
PC
instmem
77
add r3, r1, r2nand r6, r4, r5 add r3, r1, r2lw r4, 20(r2) nand r6, r4, r5 add r3, r1, r2add r5, r2, r5 lw r4, 20(r2) nand r6, r4, r5 add r3, r1, r2sw r7, 12(r3) add r5, r2, r5 lw r4, 20(r2) nand r6, r4, r5 add r3, r1, r2sw r7, 12(r3) add r5, r2, r5 lw r4, 20(r2) nand r6, r4, r5sw r7, 12(r3) add r5, r2, r5 lw r4, 20(r2)sw r7, 12(r3) add r5, r2, r5 sw r7, 12(r3)
Rd
Ra Rb
DB
A
20
Time Graphs
1 2 3 4 5 6 7 8 9
add
nand
lw
add
sw
Clock cycle
Latency:Throughput:Concurrency:
CPI =
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
21
Pipelining Recap
Powerful technique for masking latencies• Logically, instructions execute one at a time• Physically, instructions execute in parallel
– Instruction level parallelism
Abstraction promotes decoupling• Interface (ISA) vs. implementation (Pipeline)