lec21-quite good but same thing
TRANSCRIPT
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 1/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
The Pipelined MIPS Processor
The Pipelined MIPS Processor• We complete our study of computer architecture by investigating anapproach providing even higher performance for the MIPS CPU.
• We first saw how the MIPS CPU erformance could be im roved
by converting the so-called single-cycle CPU to a multi-cycle design.– In the multi-cycle approach, instead of using a single clock cycle for the
whole instruction, the clock is accelerated, and instructions execute inphases over several clock cycles.
– Each instruction phase takes one clock cycle.
– This means that as each instruction executes, only one section of the
CPU will be active per clock cycle -- the one executing that phase of theinstruction.
• This suggests that perhaps we might redesign the CPU slightly so
© N. B. Dodge 09/091 Lecture #21: The Pipeline MIPS Processor
a every sec on can opera e n epen en y on an ns ruc onat the same time.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 2/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
The “Laundry Example”The “Laundry Example”
• As an introduction to the concept of pipelining, Patterson andHennessy use the example of doing one’s laundry.
• Most eo le have – or have access to – a washer and dr er.
• Assume that you need to wash several washer loads of clothing.
• Would anyone divide the clothing into washer loads and then, ,
second?
• No, if you were washing clothes, you would finish washing the first
, , .• If there were more loads to wash, you would begin to fold and putaway finished clothing while the later loads were washing and
© N. B. Dodge 09/092 Lecture #21: The Pipeline MIPS Processor
.
• We can see this schematically on the next slide.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 3/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Graphical Example of the Laundry CycleGraphical Example of the Laundry Cycle
© N. B. Dodge 09/093 Lecture #21: The Pipeline MIPS Processor
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 4/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
The “Pipeline” ProcessorThe “Pipeline” Processor
• Patterson and Hennessy applied this “simultaneous wash-dry-fold-put away concept” to the single-cycle computer model.
“ ”, , ,simultaneously so that the instruction throughput – the number of
clock cycles per instructions – could be dramatically decreased.,
cycle, but the clock must be as slow as the slowest instruction.
• In the multi-cycle implementation, the clock runs faster, instructions
- , .• What if, each time the clock ticked, we could process an instruction
in each section of the multicycle processor? Then we could process
© N. B. Dodge 09/094 Lecture #21: The Pipeline MIPS Processor
,completing an instruction every clock cycle.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 5/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Pipeline ArchitecturePipeline Architecture
• A pipelined computer executes instructions concurrently.• Hardware units are organized into stages:
– Execution in each stage takes exactly 1 clock period.– partial results to the next stage.
• Unfortunately, as noted earlier, speed = complexity + cost.
e p pe ne approac r ngs a ona expense p us sown set of problems and complications, called hazards,which we will also study.
© N. B. Dodge 09/095 Lecture #21: The Pipeline MIPS Processor
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 6/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Sequential Versus Pipelined ExecutionSequential Versus Pipelined Execution
(clock cycles) 0 1 2 3 4 5 6 7 8 9 10
lw $t0, 16($a3) Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
lw $t1, 32($a3)
lw $t2, 48($a3)
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
Instruc.
Fetch
Reg.
Fetch
AL
Proc
4 clock cycles
4 clock cycles
etc.Timeline(clock cycles) 0 1 2 3 4 5 6 7 8 9 10
5 clock cycles
lw $t0, 16($a3)
lw $t1, 32($a3)
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
Instruc. Reg. ALU Mem. R/W Reg.
© N. B. Dodge 09/096 Lecture #21: The Pipeline MIPS Processor
lw $t2, 48($a3)etc.
e c e c rocess or u r e
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 7/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Speed Advantage of the PipelineSpeed Advantage of the Pipeline• The multicycle, serial processor that we studied last lecture can
execute n instructions in ns clock periods, or ET S = ns , where
• A pipelined processor with s stages can execute n instructions in
ET = s + n 1 clock eriods.
• The ideal pipeline speedup depends on the number of stages, andcan be greater for more stages (hence Intel’s choice of a 20-stage
i eline for the current P-IV .
• Thus the speed advantage of pipeline over multicycle can bedefined as:
s ET ns S s
© N. B. Dodge 09/097 Lecture #21: The Pipeline MIPS Processor
P s n
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 8/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Pipeline StagesPipeline Stages
0 1 2 3 4 5
ID/
oc cyc es
• The MIPS R2000 pipeline processor is divided into five processing
RF
stages:1. Instruction fetch (IF)2. Instruction decode (ID) and register fetch (RF)
3. ALU instruction execution (ALU) ALU processing, branchcondition evaluation, memory address computation, etc. This is alsoreferred to as execution (EX)
© N. B. Dodge 09/098 Lecture #21: The Pipeline MIPS Processor
. emory access5. Write back (WB) to register file
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 9/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Overlapped Pipeline ExecutionOverlapped Pipeline Execution
0 1 2 3 4 5 6 7
Clock cycles
ALUIFID/RF
MEM WB Instruction 1
ALUIFID/RF
MEM WB
ID/
Instruction 2
RF
© N. B. Dodge 09/099 Lecture #21: The Pipeline MIPS Processor
Instruction execution order
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 10/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Single-Cycle DatapathSingle-Cycle Datapath
ADDBranch
Mem. Read32
32
3232
Reg. Dest.
MUX
ADDMem. To Reg.
ALU Op.
Reg. Write
ALU Srce.
+
Leftshift
2 s 0 - 3 1
32
5
ControlMem. Write
6 (Bits 26-31)
ALUP InstructionAddress
MData
AddressInst.
M
s
Rt
Rd
ReadData 1
ReadData 2 Read
I n s t r u c t i o n b i t
32
32
3232
5
5
MUX
5
ReadWrite
Mem./Reg.Select
Lines indicate need for
UX
-UXReg. Block WriteData
SignExtend
32
WriteData
Data
16 (Bits 0-15)
32
32
ALU
InstructionMemory
DataMemory
© N. B. Dodge 09/0910 Lecture #21: The Pipeline MIPS Processor
storage between stages if processor is converted to
pipeline
6 (Bits 0-5)
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 11/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Single-Cycle Datapath with Pipeline RegistersSingle-Cycle Datapath with Pipeline Registers
MUX
Inter-stage registers are master-slave D flip-flops; the master canbe receiving new data from the previous stage of the instructionwhile the slave flip-flop is providing data to the next stage
ADDADD
+4
Memory
Leftshift
2Compare
resultReg. Block
ALU
PC
ns ruc onAddress Memory
MU
DataAddress
Inst.0-31 Rt
Rd MU
ReadData 1
ReadData 2
ReadData
SignExtend
WriteData
16 32
WriteData
Master sideof register
Slave side
© N. B. Dodge 09/0911 Lecture #21: The Pipeline MIPS Processor
IF/ID ID/EX EX/MEM MEM/WBof register
Note: Control lines andlogic not shown for clarity After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 12/49
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Instruction Process Through Pipeline (1)Instruction Process Through Pipeline (1)
MUX
ADDADD
+4
MemoryLeftshift
2Compare
resultReg. Block
ALU
PC
ns ruc onAddress Memory
MU
DataAddress
Inst.0-31 Rt
Rd MU
ReadData 1
ReadData 2
ReadData
Stage 1: Instructionloaded into IF/ID
→
SignExtend
WriteData
16 32
WriteData
© N. B. Dodge 09/0912 Lecture #21: The Pipeline MIPS Processor
,
After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition
IF/ID ID/EX EX/MEM MEM/WB
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 13/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Instruction Process Through Pipeline (2)Instruction Process Through Pipeline (2)age : ns ruc on
decoded, register dataaccessed, immediatessign-extended
MUX
ADDADD
+4
Memory
Leftshift
2Compare
resultReg. Block
ALU
PC
InstructionAddress Memory
MU
DataAddress
Inst.0-31
s
Rt
Rd MU
ReadData 1
ReadData 2
ReadData
SignExtend
XWriteData
16
X
32
WriteData
© N. B. Dodge 09/0913 Lecture #21: The Pipeline MIPS Processor
After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition
IF/ID ID/EX EX/MEM MEM/WB
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 14/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Instruction Process Through Pipeline (3)Instruction Process Through Pipeline (3)Stage 3: Instructionexecuted / branchaddress computed
MUX
ADDADD
+4
MemoryLeftshift
2Compare
resultReg. Block
ALU
PC
ns ruc onAddress Memory
MU
DataAddress
Inst.0-31 Rt
Rd MU
ReadData 1
ReadData 2
ReadData
SignExtend
WriteData
16 32
WriteData
© N. B. Dodge 09/0914 Lecture #21: The Pipeline MIPS Processor
After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition
IF/ID ID/EX EX/MEM MEM/WB
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 15/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Instruction Process Through Pipeline (4)Instruction Process Through Pipeline (4)
store, branch taken/not takenALU results bypass taken toMEM/WB register
MUX
ADDADD
+4
MemoryLeftshift
2Compare
resultReg. Block
ALU
PC
ns ruc onAddress Memory
MU
DataAddress
Inst.0-31 Rt
Rd MU
ReadData 1
ReadData 2
ReadData
SignExtend
WriteData
16 32
WriteData
© N. B. Dodge 09/0915 Lecture #21: The Pipeline MIPS Processor
After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition
IF/ID ID/EX EX/MEM MEM/WB
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 16/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Instruction Process Through Pipeline (5)Instruction Process Through Pipeline (5)M
UX
ADD+4
Instruction
MemoryLeftshift
2Compare
resultRs
Reg. Block
ALUC
Address emory
MUX
DataAddress
Inst.0-31 Rt
Rd
Write
MUX
ReadData 1
ReadData 2
ReadData
Stage 5: Resultwrite-back to
SignExtend
Data
16 32
WriteData
© N. B. Dodge 09/0916 Lecture #21: The Pipeline MIPS Processor
dest. register
After David A. Patterson and John L. Hennessy, Computer Organization and Desi gn , 2nd Edition
IF/ID ID/EX EX/MEM MEM/WB
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 17/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Adding ControlAdding Control
• Control information must be carried along as a part of the instruction, since this information is required at
erent stages o t e p pe ne .• This can be done by adding more inter-stage storage
• The result is very large inter-stage registers . Forexample, the storage capacity required between the
instruction decode and ALU execution stages (ID/EXregister) is more than 120 bits.
•
© N. B. Dodge 09/0917 Lecture #21: The Pipeline MIPS Processor
is shown on the next slide
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 18/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g i s
t e r W r
P InstructionAddress
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1
I n s t r u c t i o n b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
Full PipelineDesign with
MemorySign
Extend
32DataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
© N. B. Dodge 09/0918 Lecture #21: The Pipeline MIPS Processor
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 19/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
The Pipeline in ActionThe Pipeline in Action
• The following instruction sequence from the P&H textillustrates the pipeline in action.w ,
sub $11, $2, $3, ,
or $13, $6, $7add $14 $8 $9
• Note that registers are identified by number ratherthan the letter id’s, since that is the way they appear in
© N. B. Dodge 09/0919 Lecture #21: The Pipeline MIPS Processor
e processor. s a rem n er, = a , - = -
t6, $2-3=$v0-v1, $4-7=$a0-a3, etc.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 20/49
IF: Idle ID/RF: Idle EX: Idle MEM: Idle WB: Idle
M
ID/EX EX/MEM MEM/WB
r i t e
ADDADD
+4
UX
IF/ID
0 - 3 1
Decode
r i t e a d
R e g i s t e r
PInstruction
Address
MemoryLeftshift
2
Inst.0-31
Reg. Block
Rs
Rt ReadData 1 I
n s t r u c t i o n b i t
ALU Srce
Branch
M e m o r y
W
M e m o r y
R
e m o r y / A
L U R e s u l
ALUMUX
DataAddress
MUX
Rd
WriteData
ReadData 2
Write
ReadData
MemorySign
Extenda as -
Bits 16-20
Bits 11-15
ALUCont.
MUX
ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition20
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 21/49
IF: lw $10 20 $1 ID/RF: Idle EX: Idle MEM: Idle WB: Idle
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g i s t e r
W r
PInstruction
Address
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1
I n s t r u c t i o n
b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
MemorySignExtend
32DataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition21
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 22/49
IF: sub $11 $2 $3 ID/RF: lw $10 20 $1 EX: Idle MEM: Idle WB: Idle
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g i s t e r
W r
[ $1 ]
$ 1
$ 10
PInstruction
Address
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1
I n s t r u c t i o n
b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
X ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
$ 10
X
20
MemorySignExtendDataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition22
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 23/49
IF: and $12 $4 $5 MEM: Idle WB:ID/RF: sub $11 $2 $3 EX: lw $10 20 $1 Idle
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g i s t e r
W r
[ $ 2 ] [ $1 ]$ 3
$ 2
PInstruction
Address
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1
I n s t r u c t i o n
b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
[ $ 3 ]
add20
ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
$ 11
X
X
20
$ 10
$ 10
MemorySignExtendDataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition23
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 24/49
IF: or $13 $6 $7 WB:ID/RF: and $12 $4 $5 EX: sub $11 $2 $3 MEM: lw $10 20 $1 Idle
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g i s t e r
W r
[ $2 ]
$ 4
$ 5
PInstruction
Address
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1
I n s t r u c t i o n
b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
[$3]
[ $3 ]sub
ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
X
X
$ 12 $ 11$ 11 $ 10
MemorySignExtendDataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition24
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 25/49
IF: add $14 $8 $9 ID/RF: or $13 $6 $7 EX: and $12 $4 $5 MEM: sub $11 $2 $3 WB: lw $10 20($1)
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g i s t e r
W r
[ $6 ]
$ 6
$ 7[ $4 ]P
InstructionAddress
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1 I n
s t r u c t i o n b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
[ $7 ] [$5]
[ $5 ]and
$10 ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
X
X
$ 13 $ 12$ 12 $ 11 $ 10
MemorySignExtend
32 DataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition25
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 26/49
IF: Idle ID/RF: add $14 $8 $9 EX: or $13 $6 $7 MEM: and $12 WB: sub$4, $5 $11, $2, $3
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g i s t e r
W r
[ $8 ]
$ 8
$ 9[ $6 ]P
InstructionAddress
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1 I n
s t r u c t i o n b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
[ $9 ]$11 [$7]
[ $7 ] or
ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
X
X
$ 14 $ 13$ 13 $ 12 $ 11
MemorySignExtendDataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition26
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 27/49
IF: Idle ID/RF: Idle WB EX: add $14 $8 $9 $12, $4, $5
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g i s t e r
W r
[ $8 ]PInstruction
Address
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1 I n
s t r u c t i o n b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
[$9]
[ $9 ] add
$12 ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
$ 14$ 14 $ 13 $ 12
MemorySignExtend
32 DataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition27
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 28/49
IF: Idle ID/RF: Idle WB EX: Idle $13, $6, $7
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g i s t e r
W r
PInstruction
Address
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1 I n s t r u c t i o n
b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
$13 ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
$ 14 $ 13
MemorySignExtend
32 DataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition28
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 29/49
IF: Idle ID/RF: Idle WB EX: Idle MEM: Idle $14, $8, $9
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g i s t e r
W r
PInstruction
Address
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1 I n s t r u c t i o n
b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
$14 ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
$ 14
MemorySignExtend
32 DataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition29
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 30/49
IF: Idle ID/RF: Idle EX: Idle MEM: Idle WB: Idle
ID/EX EX/MEM MEM/WB
i t e
ADD+4
UX
IF/ID
- 3 1
ControlDecode
t e d
R e g
i s t e r W r
PInstruction
Address
MemoryLeftshift
2
Inst.
Reg. Block
Rs
Rt ReadData 1 I n s t r u c t i o n
b i t s
ALU Srce
Branch
M e m o r y
W r
M e m o r y
R e
e m o r y / A
L U R e s u l t
ALUMUX
DataAddress
-
MUX
Rd
WriteData
ReadData 2
Write
ReadData
M
MemorySignExtend
32 DataBits 0-15
Bits 16-20
Bits 11-15
ALUCont.
MU ALU Op
30
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 31/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Pipeline Processor Operation SummaryPipeline Processor Operation Summary
• Pipelining replaces the “single-cycle” processor with a“ - ” ,
completing one part of each instruction.
• A new instruction is started every clock cycle.• Inter-process registers store instruction information
(data, write register, branch conditions) between cycles“ ”
between the pipeline stages.• When the pipeline is filled with instructions, an
© N. B. Dodge 09/0931 Lecture #21: The Pipeline MIPS Processor
instruction completes every clock cycle.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 32/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Exercise 1Exercise 1
• On the diagram on the next page, identify thefollowing:1. Highlight all the control lines that must be active during a load
word instruction.2. As in our exercise in Lecture 20, identify the decoder
locations.
3. The ID/EX Re ister interface stores the most bits of an of the
pipeline section interfaces. Approximately how many bits isthat, according to the diagram?
© N. B. Dodge 09/0932 Lecture #21: The Pipeline MIPS Processor
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 33/49
Print out a copy of this diagram and bring to class.
MUX
ID/EX EX/MEM MEM/WB
ControlDecode
g i s t e r W r i t e
ADDADD
+4
MemoryLeftshift
2
Reg. Block i o n b i t s 0 - 3 1
Branch o r y
W r i t e
m o r y
R e a d
R e
R e s u l t
ALU
PC
InstructionAddress
M Data
Inst.0-31
M
Rt
Rd
ReadData 1
ReadData 2 Read
I n s t r u cALU Srce M
e M e
M e m o r y / A
L
Memory
UX
UX
WriteData
SignExtend
32WriteData
a a
Bits 0-15
ALU
Bits 16-20
Bits 11-15
.
MUX
ALU Op
Reg. Dst.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 34/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
HazardsHazards
• Hazards occur because data required for executing the.
• An instruction in the “register fetch” cycle may need
data from a register whose value will be changed by aninstruction “downstream” but still in process in thepipeline (in the ALU, memory/memory bypass orwriteback c cle .
• Thus an “upstream” instruction could access a registerand get incorrect data because the register data has not
© N. B. Dodge 09/0934 Lecture #21: The Pipeline MIPS Processor
yet een up ate y a ownstream nstruct on.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 35/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Hazards (2)Hazards (2)
• There are two types of hazards, data hazards , andcontrol hazards .• Both occur because an instruction in the ID/RF stage of
the MIPS pipeline needs register data that will be
MEM/Bypass, or WB stage.• Data hazards occur when an instruction needs register
contents for an arithmetic/ logical/memory instruction.• Control hazards occur when a branch instruction is
© N. B. Dodge 09/0935 Lecture #21: The Pipeline MIPS Processor
branch is not yet available in the same sort of scenario.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 36/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Data Hazard in the PipelineData Hazard in the PipelineTimeline
(clock cycles) 0 1 2 3 4 5 6 7 8 9 10
5 clock cycles
sub $2, $1, $3
and $12, $2, $5
or $13 $6 $2 Instruc. Reg. ALU Mem. R/W Reg.
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
add $14, $2, $2
sw $15, 100($2) Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
• In the instruction sequence above, the last four instructionsrequire data from $2, which is changed in the first instruction.• The $2 data will not be rewritten until cycle 4, so the AND and OR
© N. B. Dodge 09/0936 Lecture #21: The Pipeline MIPS Processor
n an r ns ruc ons w e c ncorrec a a rom .• Even the add may not get the correct information ( sw is okay ).
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 37/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Control Hazards in the PipelineControl Hazards in the PipelineTimeline
(clock cycles) 0 1 2 3 4 5 6 7 8 9 10
5 clock cycles
sub $2, $1, $3
blt $2, $8, wait
b t $2 $7 o Instruc. Reg. ALU Mem. R/W Reg.
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
add $14, $2, $2
sw $15, 100($2) Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
• Here the problem is changed, with two branch instructions added.• Neither branch instruction may be executed correctly, once again
© N. B. Dodge 09/0937 Lecture #21: The Pipeline MIPS Processor
.
• This wrong data could cause an incorrect branch.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 38/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Forwarding as a Solution to Data HazardsForwarding as a Solution to Data Hazards
0 1 2 3 4 5
oc cyc es
ID/RF
ID/
• One solution to the problem of data hazards is forwarding .
RF
• Forwarding uses the fact that although instruction 2 needs registerdata two clock cycles before instruction 1 enters the WB stage, thatdata is already available as the output of the ALU .
© N. B. Dodge 09/0938 Lecture #21: The Pipeline MIPS Processor
• If a mechanism were available, instruction 1 could forward requiredregister data after its ALU cycle to the ID/RF cycle of instruction 2.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 39/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Forwarding Unit in the PipelineForwarding Unit in the Pipeline
Rs
Rt
ReadData 1
ID/EX EX/MEM MEM/WB
MU
ALU MU
X
Rd
WriteData
ReadData 2
DataAddress
ReadData
MU
X
Forward A
Memory
r eData
M
X
Rs
RtEX/MEM Register Rd
Forward B
XForwarding
UnitMEM/WB Register Rd
© N. B. Dodge 09/0939 Lecture #21: The Pipeline MIPS Processor
After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 40/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Forwarding Unit OperationForwarding Unit Operation
ALU
Reg. Block
Memory
ForwardingUnit
• The forwarding unit samples register id’s in the EX/MEM andMEM/WB registers to determine if source registers in the ID/RFcyc e are e same.
• If so, source register data is replaced by pipeline (as yet unwritten)data by the forwarding unit.
© N. B. Dodge 09/0940 Lecture #21: The Pipeline MIPS Processor
• The correct information is thus processed and the instruction canproceed to correct execution.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 41/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
StallsStalls
• Forwarding will not always solve the problems of data hazards.• For exam le su ose an add instruction follows a load word lw
and the add involves the register that receives the memory data.
• In this case, forwarding will not work.
,will not be available until the end of the MEM cycle. Thus therequired data is not available for a forward, and the addinstruction. if it roceeds will rocess the wron data.
• A solution to this problem is the stall.• A stall halts the instruction awaiting data, while the key
© N. B. Dodge 09/0941 Lecture #21: The Pipeline MIPS Processor
cycle, after which the desired data is available to the add.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 42/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Result of Stall ApproachResult of Stall ApproachTimeline
(clock cycles) 0 1 2 3 4 5 6 7 8 9 10
5 clock cycles
lw $2, 32($3)
add $14, $6, $2
sw $15 80 $2 Instruc. Reg. ALU Mem. R/W Reg.
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
• Consider the 3 instructions above, the last twodepending on the lw.
• $2 contents will be available at the beginning of the WBstage in the first instruction, but not before.
© N. B. Dodge 09/0942 Lecture #21: The Pipeline MIPS Processor
,the add and sw instructions hold place for one cycle.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 43/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Result of Stall Approach (2)Result of Stall Approach (2)(clock
cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles
lw $2, 32($3)
add $14, $6, $2 (delayed 1 count)
sw $15, 80($2) (delayed 1 count)Instruc.
FetchReg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
.Fetch
.Fetch Process
.or ALU Out
.Write
• With the delay, the lw result feeds the ALU input stageof the add instruction, and the fetch stage of the sw.
• Note that forwarding in still required (this time fromthe MEM/WB interface, not the ALU output).
© N. B. Dodge 09/0943 Lecture #21: The Pipeline MIPS Processor
, ,following a lw must also be delayed for one clock cycle.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 44/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Other Problems With BranchesOther Problems With Branches
• A remaining problem is what to do about instructions following abranch. Even assuming forwarding and stalls, the branch/nobranch decision is not made until the third stage. This means thatin the MIPS pipeline, two following instructions will enter the pipe
before the branch/no branch decision is made. What if:– The following instructions were for the case of “branch taken” and
the branch was not taken.
– The following instructions were for “branch not taken” and it wasa en.
• In either case, the wrong instructions are in the pipe and they mustbe eliminated (“flushed”). How can this problem be prevented?
© N. B. Dodge 09/0944 Lecture #21: The Pipeline MIPS Processor
• A few approaches to the problem are shown in the following slides.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 45/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Control Hazard Approaches (1)Control Hazard Approaches (1)MIPS R-2000 Pipeline Processor
WBALU/EX
ID/RFIFMEM/
• One a roach is to alwa s assume the branch is or is not taken:
Direction of pipeline flow
– Say we assume the branch is never taken . Then if the instruction in ALU/EX
is a branch, the instructions in IF and ID/RF will be those in the “not taken”program line (branch determination is made in ALU/EX).
– s assump on s correc , e p pe ne w con nue o ow w ou e ay.– When the branch is taken, instructions in IF and ID/RF must be “flushed,”
usually by changing the “op” code of those instructions to a “nop” and lettingthem proceed to the end of the pipe.
© N. B. Dodge 09/0945 Lecture #21: The Pipeline MIPS Processor
– Clearly, a 2-clock time delay is involved here, and it would be worse for longerpipelines (P-IV pipeline ~ 20 stages).
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 46/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Control Hazard Approaches (2)Control Hazard Approaches (2)MIPS R-2000 Pipeline Processor
WBALU/EXID/RF
IFMEM/
BranchComparator
• Reducing the cost of taking the branch:– In this case, a branch assumption is still made (taken or not taken).
– identified in the ID/RF stage, a comparator can be added there to do thebranch/no-branch determination.
– With the branch determination made in this earl sta e onl one
© N. B. Dodge 09/0946 Lecture #21: The Pipeline MIPS Processor
instruction must be flushed, in the IF stage (only a 1-instruction delay).
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 47/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Control Hazard Approaches (3)Control Hazard Approaches (3)MIPS R-2000 Pipeline Processor
WBALU/EXID/RF
IFMEM/
Branch feedback based on History
BranchHistory
• ynam c ranc pre ct on ase on recent ranc story:– In this approach, an indicator bit (0/1) gives the last branch condition.– The next branch can be made according to the bit setting.– ,
time until a substantial number of calculations are complete.– Some schemes use 2 bits and do not change the prediction until the
predictor is wrong twice, after which the alternate behavior is chosen.
© N. B. Dodge 09/0947 Lecture #21: The Pipeline MIPS Processor
– In either case, incorrect predictions will still be made, but hopefully notas often.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 48/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Exercise 2Exercise 2
1. Explain forwarding in your own words..
problem be solved?
3. Wh could 2-bit d namic branch rediction work toensure about a 1% error rate in branch prediction ina subroutine that loops about 100 times before
called frequently, and that it always executes 100 ormore loop traversals before returning to the calling
© N. B. Dodge 09/0948 Lecture #21: The Pipeline MIPS Processor
program.
8/6/2019 Lec21-Quite Good but Same Thing
http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 49/49
Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as
Homework Homework
• As usual, write down the two or three most importantthings you learned today and add to your list.
• Also, write down two or three things you did not clearly
you still have questions, see me during office hours.
• Readings, per syllabus. Note: Some of the PH material
is hard slogging, but for those of you interested incomputer engineering, it is well worth reading once.
•
© N. B. Dodge 09/0951 Lecture #21: The Pipeline MIPS Processor
.