cpe 242 computer architecture and engineering designing a pipeline processor
DESCRIPTION
CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor. Outline of Today’s Lecture. Recap and Introduction (5 minutes) Introduction to the Concept of Pipelined Processor (15 minutes) Pipelined Datapath and Pipelined Control (25 minutes) - PowerPoint PPT PresentationTRANSCRIPT
CPE 442 pipeline.1 Intro to Computer Architecture
CpE 242Computer Architecture and
EngineeringDesigning a Pipeline Processor
CPE 442 pipeline.2 Intro to Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction (5 minutes)
° Introduction to the Concept of Pipelined Processor (15 minutes)
° Pipelined Datapath and Pipelined Control (25 minutes)
° How to Avoid Race Condition in a Pipeline Design? (5 minutes)
° Pipeline Example: Instructions Interaction (15 minutes)
° Summary (5 minutes)
CPE 442 pipeline.3 Intro to Computer Architecture
A Single Cycle Processor
32
ALUctr
Clk
busW
RegWr
32
32
busA
32
busB
55 5
Rw Ra Rb
32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Exten
der
Mu
x
Mux
3216imm16
ALUSrc
ExtOp
Mu
x
MemtoReg
Clk
Data InWrEn
32
Adr
DataMemory
32
MemWrA
LU
Zero
0
1
0
1
01
InstructionFetch Unit
Clk
Instruction<31:0>Jump
Branch
<21:25>
<16:20>
<11:15>
<0:15>
Imm16
Rd
MainControl
op
ALUControlfunc
ALUop
3
RegDst
ALUSrc
:
<5:0>
<31:26>
Instr<15:0>
Zero
3
CPE 442 pipeline.4 Intro to Computer Architecture
Drawbacks of this Single Cycle Processor
° Long cycle time:
• Cycle time must be long enough for the load instruction:
- PC’s Clock -to-Q +
- Instruction Memory Access Time +
- Register File Access Time +
- ALU Delay (address calculation) +
- Data Memory Access Time +
- Register File Setup Time +
- Clock Skew
° Cycle time is much longer than needed for all other instructions. Examples:
• R-type instructions do not require data memory access
• Jump does not require ALU operation nor data memory access
CPE 442 pipeline.5 Intro to Computer Architecture
Overview of a Multiple Cycle Implementation
° The root of the single cycle processor’s problems:
• The cycle time has to be long enough for the slowest instruction
° Solution:
• Break the instruction into smaller steps
• Execute each step (instead of the entire instruction) in one cycle
- Cycle time: time it takes to execute the longest step
- Keep all the steps to have similar length
• This is the essence of the multiple cycle processor
° The advantages of the multiple cycle processor:
• Cycle time is much shorter
• Different instructions take different number of cycles to complete
- Load takes five cycles
- Jump only takes three cycles
• Allows a functional unit to be used more than once per instruction
CPE 442 pipeline.6 Intro to Computer Architecture
Multiple Cycle Processor
° MCP: A functional unit to be used more than once per instruction
IdealMemoryWrAdrDin
RAdr
32
32
32Dout
MemWr
32
AL
U
3232
ALUOp
ALUControl
Instru
ction R
eg
32
IRWr
32
Reg File
Ra
Rw
busW
Rb5
5
32busA
32busB
RegWr
Rs
Rt
Mu
x
0
1
Rt
Rd
PCWr
ALUSelA
Mux 01
RegDst
Mu
x
0
1
32
PC
MemtoReg
Extend
ExtOp
Mu
x
0
132
0
1
23
4
16Imm 32
<< 2
ALUSelB
Mu
x1
0
Target32
Zero
ZeroPCWrCond PCSrc BrWr
32
IorD
CPE 442 pipeline.7 Intro to Computer Architecture
Timing Diagram of a Load Instruction
Clk
PC
Rs, Rt, Rd,Op, Func
Clk-to-Q
ALUctr
Instruction Memory Access Time
Old Value New Value
RegWr Old Value New Value
Delay through Control Logic
busA
Register File Access Time
Old Value New Value
busB
ALU Delay
Old Value New Value
Old Value New Value
New ValueOld Value
ExtOp Old Value New Value
ALUSrc Old Value New Value
Address Old Value New Value
busW Old Value New
Delay through Extender & Mux
Data Memory Access Time
Instruction Fetch Instr Decode /
Reg. Fetch
Address Reg WrData Memory
Register F
ile Write T
ime
CPE 442 pipeline.8 Intro to Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction (5 minutes)
° Introduction to the Concept of Pipelined Processor (15 minutes)
° Pipelined Datapath and Pipelined Control (25 minutes)
° How to Avoid Race Condition in a Pipeline Design? (5 minutes)
° Pipeline Example: Instructions Interaction (15 minutes)
° Summary (5 minutes)
CPE 442 pipeline.9 Intro to Computer Architecture
The Five Stages of Load
° Ifetch: Instruction Fetch
• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: Calculate the memory address
° Mem: Read the data from the Data Memory
° Wr: Write the data back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
CPE 442 pipeline.10 Intro to Computer Architecture
Key Ideas Behind Pipelining
° Grading the Final exam for a class of 100 students:
• 5 problems, five people grading the exam
• Each person ONLY grade one problem
• Pass the exam to the next person as soon as one finishes his part
• Assume each problem takes 12 min to grade
- Each individual exam still takes 1 hour to grade
- But with 5 people, all exams can be graded five times quicker
° The load instruction has 5 stages:
• Five independent functional units to work on each stage
- Each functional unit is used only once
• The 2nd load can start as soon as the 1st finishes its Ifetch stage
• Each load still takes five cycles to complete
• The throughput, however, is much higher
CPE 442 pipeline.11 Intro to Computer Architecture
Key Ideas Behind Pipelining
° Let n be number of tasks or exams (or instructions)
° Let k be number of stages for each task
° Let T be the time per stage
° Time per task = T . k
° Total Time per n tasks for non-pipelined solution = T . k . n
° Total Time per n tasks for pipelined solution = T . k + T . (n-1)
° Speedup = pipelined perform/ non-pipelined performance
= Total Time non-pipelined/ Total Time for pipelined
= k . n / k + n-1 = k approx. when n >> k
InputTasks
K – stage pipeline
buffer
Stage 1 Stage 2 Stage k
CPE 442 pipeline.12 Intro to Computer Architecture
Pipelining the Load Instruction
° The five independent functional units in the pipeline datapath are:
• Instruction Memory for the Ifetch stage
• Register File’s Read ports (bus A and busB) for the Reg/Dec stage
• ALU for the Exec stage
• Data Memory for the Mem stage
• Register File’s Write port (bus W) for the Wr stage
° One instruction enters the pipeline every cycle
• One instruction comes out of the pipeline (complete) every cycle
• The “Effective” Cycles per Instruction (CPI) is 1
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Ifetch Reg/Dec Exec Mem Wr1st lw
Ifetch Reg/Dec Exec Mem Wr2nd lw
Ifetch Reg/Dec Exec Mem Wr3rd lw
CPE 442 pipeline.13 Intro to Computer Architecture
The Four Stages of R-type
° Ifetch: Instruction Fetch
• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: ALU operates on the two register operands
° Wr: Write the ALU output back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec WrR-type
CPE 442 pipeline.14 Intro to Computer Architecture
Pipelining the R-type and Load Instruction
° We have a problem:
• Two instructions try to write to the register file at the same time!
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type
Ops! We have a problem!
CPE 442 pipeline.15 Intro to Computer Architecture
Important Observation
° Each functional unit can only be used once per instruction
° Each functional unit must be used at the same stage for all instructions:
• Load uses Register File’s Write Port during its 5th stage
• R-type uses Register File’s Write Port during its 4th stage
Ifetch Reg/Dec Exec Mem WrLoad
1 2 3 4 5
Ifetch Reg/Dec Exec WrR-type
1 2 3 4
CPE 442 pipeline.16 Intro to Computer Architecture
Solution 1: Insert “Bubble” into the Pipeline
° Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle
• The control logic can be complex
° No instruction is completed during Cycle 5:
• The “Effective” CPI for load is 2
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type Pipeline
Bubble
Ifetch Reg/Dec Exec Wr
CPE 442 pipeline.17 Intro to Computer Architecture
Solution 2: Delay R-type’s Write by One Cycle
° Delay R-type’s register write by one cycle:
• Now R-type instructions also use Reg File’s write port at Stage 5
• Mem stage is a NOOP stage: nothing is being done
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Exec WrR-type Mem
Exec
Exec
Exec
Exec
1 2 3 4 5
CPE 442 pipeline.18 Intro to Computer Architecture
The Four Stages of Store
° Ifetch: Instruction Fetch
• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: Calculate the memory address
° Mem: Write the data into the Data Memory
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec MemStore Wr
CPE 442 pipeline.19 Intro to Computer Architecture
The Four Stages of Beq
° Ifetch: Instruction Fetch
• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: ALU compares the two register operands
• Adder calculates the branch target address
° Mem: If the registers we compared in the Exec stage are the same,
• Write the branch target address into the PC
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec MemBeq Wr
CPE 442 pipeline.20 Intro to Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction (5 minutes)
° Introduction to the Concept of Pipelined Processor (15 minutes)
° Pipelined Datapath and Pipelined Control
° How to Avoid Race Condition in a Pipeline Design? (5 minutes)
° Pipeline Example: Instructions Interaction (15 minutes)
° Summary (5 minutes)
CPE 442 pipeline.21 Intro to Computer Architecture
A Pipelined Datapath
IF/ID
Register
ID/E
x Register
Ex/M
em R
egister
Mem
/Wr R
egister
PC
DataMem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
MemWr
RegWr ExtOp
ExecUnit
busA
busB
Imm16
ALUOp
ALUSrc
Mu
x
1
0
MemtoReg
1
0
RegDst
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch
10
Clk
Ifetch Reg/Dec Exec Mem Wr
CPE 442 pipeline.22 Intro to Computer Architecture
The Instruction Fetch Stage
IF/ID
: lw $1, 100 ($2)
ID/E
x Register
Ex/M
em R
egister
Mem
/Wr R
egister
PC
= 14 Data
Mem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
MemWr
RegWr ExtOp
ExecUnit
busA
busB
Imm16
ALUOp
ALUSrc
Mu
x
1
0
MemtoReg
1
0
RegDst
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch
10
Clk
Ifetch Reg/Dec Exec Mem
You are here!
° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
CPE 442 pipeline.23 Intro to Computer Architecture
A Detail View of the Instruction Unit
° Location 10: lw $1, 0x100($2)
IF/ID
: lw $1, 100 ($2)
PC
= 14
10
10
Ad
der
InstructionMemory
“4”
Instruction
Address
Clk
Ifetch
You are here!
Reg/Dec
CPE 442 pipeline.24 Intro to Computer Architecture
The Decode / Register Fetch Stage
IF/ID
:
ID/E
x: Reg. 2 &
0x100
Ex/M
em R
egister
Mem
/Wr R
egister
PC
DataMem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
MemWr
RegWr ExtOp
ExecUnit
busA
busB
Imm16
ALUOp
ALUSrc
Mu
x
1
0
MemtoReg
1
0
RegDst
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch
10
Clk
Ifetch Reg/Dec Exec Mem
You are here!
° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
CPE 442 pipeline.25 Intro to Computer Architecture
Load’s Address Calculation Stage
IF/ID
: ID/E
x Register
Ex/M
em: L
oad’s A
dd
ress
Mem
/Wr R
egister
PC
DataMem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
MemWr
RegWr ExtOp=1
ExecUnit
busA
busB
Imm16
ALUOp=Add
ALUSrc=1
Mu
x
1
0
MemtoReg
1
0
RegDst=0
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch
10
Clk
Ifetch Reg/Dec Exec Mem
You are here!
° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
CPE 442 pipeline.26 Intro to Computer Architecture
A Detail View of the Execution Unit
ID/E
x Register
Ex/M
em: L
oad’s M
emory A
dd
ressALUControl
ALUctr
32busA
32
busB
Exten
der
Mu
x
16
imm16
ALUSrc=1ExtOp=1
3
AL
U
Zero
0
1
32
ALUout
32
Ad
der
3 ALUOp=Add
<< 2
32PC+4
Target
32
Clk
Exec
You are here!
Mem
CPE 442 pipeline.27 Intro to Computer Architecture
Load’s Memory Access Stage
IF/ID
: ID/E
x Register
Ex/M
em R
egister
Mem
/Wr: L
oad’s D
ata
PC
DataMem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
MemWr=0
RegWr ExtOp
ExecUnit
busA
busB
Imm16
ALUOp
ALUSrc
Mu
x
1
0
MemtoReg
1
0
RegDst
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch=0
10
Clk
Ifetch Reg/Dec Exec Mem
You are here!
° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
CPE 442 pipeline.28 Intro to Computer Architecture
Load’s Write Back Stage
IF/ID
: ID/E
x Register
Ex/M
em R
egister
Mem
/Wr R
egister
PC
DataMem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
MemWr
RegWr=1 ExtOp
ExecUnit
busA
busB
Imm16
ALUOp
ALUSrc
Mu
x
1
0
MemtoReg=1
1
0
RegDst
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch
10
Clk
Ifetch Reg/Dec Exec Mem
You are somewhere out there!
° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
Wr
CPE 442 pipeline.29 Intro to Computer Architecture
How About Control Signals?
IF/ID
: ID/E
x Register
Ex/M
em: L
oad’s A
dd
ress
Mem
/Wr R
egister
PC
DataMem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
MemWr
RegWr ExtOp=1
ExecUnit
busA
busB
Imm16
ALUOp=Add
ALUSrc=1
Mu
x
1
0
MemtoReg
1
0
RegDst=0
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch
10
Ifetch Reg/Dec Exec Mem
° Key Observation: Control Signals at Stage N = Func (Instr. at Stage N)
• N = Exec, Mem, or Wr
° Example: Controls Signals at Exec Stage = Func(Load’s Exec)
Wr
CPE 442 pipeline.30 Intro to Computer Architecture
Pipeline Control
° The Main Control generates the control signals during Reg/Dec
• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later
• Control signals for Mem (MemWr Branch) are used 2 cycles later
• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later
IF/ID
Register
ID/E
x Register
Ex/M
em R
egister
Mem
/Wr R
egister
Reg/Dec Exec Mem
ExtOp
ALUOp
RegDst
ALUSrc
Branch
MemWr
MemtoReg
RegWr
MainControl
ExtOp
ALUOp
RegDst
ALUSrc
MemtoReg
RegWr
MemtoReg
RegWr
MemtoReg
RegWr
Branch
MemWr
Branch
MemWr
Wr
CPE 442 pipeline.31 Intro to Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction (5 minutes)
° Introduction to the Concept of Pipelined Processor (15 minutes)
° Pipelined Datapath and Pipelined Control (25 minutes)
° How to Avoid Race Condition in a Pipeline Design?
° Pipeline Example: Instructions Interaction (15 minutes)
° Summary (5 minutes)
CPE 442 pipeline.32 Intro to Computer Architecture
Beginning of the Wr’s Stage: A Real World Problem
° At the beginning of the Wr stage, we have a problem if:
• RegAdr’s (Rd or Rt) Clk-to-Q > RegWr’s Clk-to-Q
° Similarly, at the beginning of the Mem stage, we have a problem if:
• WrAdr’s Clk-to-Q > MemWr’s Clk-to-Q
° We have a race condition between Address and Write Enable!
Ex/M
em
Mem
/Wr RegAdr
RegWr MemWr
Data
WrAdr
DataRegFile
DataMemory
Clk
RegAdr
RegWrRegWr’s Clk-to-Q
RegAdr’s Clk-to-Q
Clk
WrAdr
MemWrMemWr’s Clk-to-Q
WrAdr’s Clk-to-Q
CPE 442 pipeline.33 Intro to Computer Architecture
The Pipeline Problem
° Multiple Cycle design prevents race condition between Addr and WrEn:
• Make sure Address is stable by the end of Cycle N
• Asserts WrEn during Cycle N + 1
° This approach can NOT be used in the pipeline design because:
• Must be able to write the register file every cycle
• Must be able write the data memory every cycle
Clock
Ifetch Reg/Dec Exec Mem WrStore
Ifetch Reg/Dec Exec Mem WrStore
Ifetch Reg/Dec Exec Mem WrR-type
Ifetch Reg/Dec Exec Mem WrR-type
CPE 442 pipeline.34 Intro to Computer Architecture
Synchronize Register File & Synchronize Memory
° Solution: And the Write Enable signal with the Clock
• This is the ONLY place where gating the clock is used
• MUST consult circuit expert to ensure no timing violation:
- Example: Clock High Time > Write Access Delay
WrEn
I_Addr
I_Data
Reg Fileor
Memory
Clk
I_Addr
I_WrEn
Address
Data
I_WrEn
C_WrEn
C_WrEn
Clk
Address
Data
WrEn
Reg Fileor
Memory
Synchronize Memory and Register File
Address, Data, and WrEn must be stableat least 1 set-up time before the Clk edge
Write occurs at the cycle followingthe clock edge that captures the signals
CPE 442 pipeline.35 Intro to Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction (5 minutes)
° Introduction to the Concept of Pipelined Processor (15 minutes)
° Pipelined Datapath and Pipelined Control (25 minutes)
° How to Avoid Race Condition in a Pipeline Design? (5 minutes)
° Pipeline Example: Instructions Interaction
° Summary (5 minutes)
CPE 442 pipeline.36 Intro to Computer Architecture
A More Extensive Pipelining Example
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem Wr0: Load
Ifetch Reg/Dec Exec Mem Wr4: R-type
Ifetch Reg/Dec Exec Mem Wr8: Store
Ifetch Reg/Dec Exec Mem Wr12: Beq (target is 1000)
End ofCycle 4
End ofCycle 5
End ofCycle 6
End ofCycle 7
° End of Cycle 4: Load’s Mem, R-type’s Exec, Store’s Reg, Beq’s Ifetch
° End of Cycle 5: Load’s Wr, R-type’s Mem, Store’s Exec, Beq’s Reg
° End of Cycle 6: R-type’s Wr, Store’s Mem, Beq’s Exec
° End of Cycle 7: Store’s Wr, Beq’s Mem
CPE 442 pipeline.37 Intro to Computer Architecture
Pipelining Example: End of Cycle 4
° 0: Load’s Mem 4: R-type’s Exec 8: Store’s Reg 12: Beq’s Ifetch
IF/ID
: Beq
Instru
ction
ID/E
x: Store’s b
usA
& B
Ex/M
em: R
-type’s R
esult
Mem
/Wr: L
oad’s D
out
PC
= 16 Data
Mem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
RegWr=0 ExtOp=x
ExecUnit
busA
busB
Imm16
ALUOp=R-type
ALUSrc=0
Mu
x
1
0
MemtoReg=x
1
0
RegDst=1
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch=0
10
12: Beq’s Ifet
8: Store’s Reg 4: R-type’s Exec 0: Load’s Mem
Clk
MemWr=0Clk
CPE 442 pipeline.38 Intro to Computer Architecture
Pipelining Example: End of Cycle 5
° 0: Lw’s Wr 4: R’s Mem 8: Store’s Exec 12: Beq’s Reg 16: R’s Ifetch
IF/ID
: Instru
ction @
16
ID/E
x: Beq
’s bu
sA &
B
Ex/M
em: S
tore’s Ad
dress
Mem
/Wr: R
-type’s R
esult
PC
= 20 Data
Mem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
RegWr=1 ExtOp=1
ExecUnit
busA
busB
Imm16
ALUOp=Add
ALUSrc=1
Mu
x
1
0
MemtoReg=1
1
0
RegDst=x
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch=0
10
16: R’s Ifet
12: Beq’s Reg 8: Store’s Exec 4: R-type’s Mem
0: Load’s Wr
Clk
MemWr=0Clk
CPE 442 pipeline.39 Intro to Computer Architecture
Pipelining Example: End of Cycle 6
° 4: R’s Wr 8: Store’s Mem 12: Beq’s Exec 16: R’s Reg 20: R’s Ifet
IF/ID
: Instru
ction @
20
ID/E
x:R-typ
e’s bu
sA &
B
Ex/M
em: B
eq’s R
esults
Mem
/Wr: N
othin
g for St
PC
= 24 Data
Mem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
RegWr=1 ExtOp=1
ExecUnit
busA
busB
Imm16
ALUOp=Sub
ALUSrc=0
Mu
x
1
0
MemtoReg=0
1
0
RegDst=x
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch=0
10
20:R-type’s Ifet
16: R-type’s Reg 12: Beq’s Exec 8: Store’s Mem
4: R-type’s Wr
Clk
MemWr=1Clk
CPE 442 pipeline.40 Intro to Computer Architecture
Pipelining Example: End of Cycle 7
° 8: Store’s Wr 12: Beq’s Mem 16: R’s Exec 20: R’s Reg 24: R’s Ifet
IF/ID
: Instru
ction @
24
ID/E
x:R-typ
e’s bu
sA &
B
Ex/M
em: R
type’s R
esults
Mem
/Wr:N
othin
g for Beq
PC
= 1000
DataMem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
RegWr=0 ExtOp=x
ExecUnit
busA
busB
Imm16
ALUOp=R-type
ALUSrc=0
Mu
x
1
0
MemtoReg=x
1
0
RegDst=1
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch=1
10
24:R-type’s Ifet
20: R-type’s Reg 16: R-type’s Exec 12: Beq’s Mem
8: Store’s Wr
Clk
MemWr=0Clk
CPE 442 pipeline.41 Intro to Computer Architecture
The Delay Branch Phenomenon
° Although Beq is fetched during Cycle 4:
• Target address is NOT written into the PC until the end of Cycle 7
• Branch’s target is NOT fetched until Cycle 8
• 3-instruction delay before the branch take effect
° This is referred to as Branch Hazard:
• Clever design techniques can reduce the delay to ONE instruction
Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11
Ifetch Reg/Dec Exec Mem Wr
Ifetch Reg/Dec Exec Mem Wr16: R-type
Ifetch Reg/Dec Exec Mem Wr
Ifetch Reg/Dec Exec Mem Wr24: R-type
12: Beq(target is 1000)
20: R-type
Clk
Ifetch Reg/Dec Exec Mem Wr1000: Target of Br
CPE 442 pipeline.42 Intro to Computer Architecture
The Delay Load Phenomenon
° Although Load is fetched during Cycle 1:
• The data is NOT written into the Reg File until the end of Cycle 5
• We cannot read this value from the Reg File until Cycle 6
• 3-instruction delay before the load take effect
° This is referred to as Data Hazard:
• Clever design techniques can reduce the delay to ONE instruction
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem WrI0: Load
Ifetch Reg/Dec Exec Mem WrPlus 1
Ifetch Reg/Dec Exec Mem WrPlus 2
Ifetch Reg/Dec Exec Mem WrPlus 3
Ifetch Reg/Dec Exec Mem WrPlus 4
CPE 442 pipeline.43 Intro to Computer Architecture
Summary
° Disadvantages of the Single Cycle Processor
• Long cycle time
• Cycle time is too long for all instructions except the Load
° Multiple Clock Cycle Processor:
• Divide the instructions into smaller steps
• Execute each step (instead of the entire instruction) in one cycle
° Pipeline Processor:
• Natural enhancement of the multiple clock cycle processor
• Each functional unit can only be used once per instruction
• If a instruction is going to use a functional unit:
- it must use it at the same stage as all other instructions
• Pipeline Control:
- Each stage’s control signal depends ONLY on the instruction that is currently in that stage
CPE 442 pipeline.44 Intro to Computer Architecture
Single Cycle, Multiple Cycle, vs. Pipeline
Clk
Cycle 1
Multiple Cycle Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipeline Implementation:
Ifetch Reg Exec Mem WrStore
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Ifetch Reg Exec Mem WrR-type
Cycle 1 Cycle 2
CPE 442 pipeline.45 Intro to Computer Architecture
Where to get more information?
° Everything You Need to know about Pipeline Computer:
• Peter Kogge, “The Architecture of Pipeline Computers,” McGraw Hill Book Company, 1981
° Some Classic References on RISC Pipelines:
• Manolis Katevenis, “Reduced Instruction Set Computer Architectures for VLSI,” PhD Thesis, UC Berkeley, 1984.
° Other references:
• David. A Patterson, “Reduced Instruction Set Computers,” Communications of the ACM, January 1985.
• Shing Kong, “Performance, Resources, and Complexity,” PhD Thesis, UC Berkeley, 1989.