ee457 lab 6 design of a pipelined cpu lab 6 part 4 ... rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0 if/id...
TRANSCRIPT
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 1 / 32 C Copyright 2006 Gandhi Puvvada
EE457
Lab 6 Design of a Pipelined CPU
Lab 6 Part 4 Questions
10/29/2006
Lecture by Gandhi Puvvada
University of Southern California
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 2 / 32 C Copyright 2006 Gandhi Puvvada
Thanks to Ray Madani and Binh Tran of DEN for their technical support
Ideally: Do this Part 4 after lab 6 parts 1, 2, 3
Ideally: 1. Read the question
2. Attempt to solve it by yourself3. Verify by viewing this lecture
Practically:View the lecture before
solving by yourself
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 3 / 32 C Copyright 2006 Gandhi Puvvada
1. [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder:
Array addition. The A(i), B(i), and C(i) are all in t he register file.
The single-bit opcode is a "1" for ADD and a "0" for NOP.
Instruction Opcode rs = Source Reg 1 rt = Source Reg 2 rd = Destination Reg
size of the fields => 1 bit 3 bits 3 bits 3 bits
add rd, rs, rt 1 rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0
nop 0 rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0
11111111111
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 4 / 32 C Copyright 2006 Gandhi Puvvada
Do NOT be misled by Miss Bruin’s design below!
opcode rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0
IF/ID
Size = 10bit
R1A2 R1A1 R1A0 R2A2 R2A1 R2A0 WA2 WA0WA1
R1D3 R1D2 R1D1 R1D0 R2D3 R2D2 R2D1 R2D0
WD3
WD2
WD1
WD0
WRITE
CLKSYS_CLKREGISTER FILE
A BCo Ci
S
A BCo Ci
S
A BCo Ci
S
A BCo Ci
S
ID/EX1Size =
EX4/WBSize =
EX3/EX4Size =
EX2/EX3Size =
EX1/EX2Size =
EX1
WB
EX2
EX3
EX4
IF
ID
D
Q
D D D D D D D D D
Q Q Q Q Q Q Q Q Q
Read_Address_1 Read_Address_2 Write_Address
Read_Data_2
Wri
te_D
ata
Read_Data_1
rs = Source Reg 1 rt = Source Reg 2 rd = Destination Reg
Miss Bruin’s Design
12 bits
12 bits
11 bits
10 bits
8 bits
1a1a1a1a1a1a1a1a1a1a1a
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 5 / 32 C Copyright 2006 Gandhi Puvvada
opcode rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0
IF/ID
Size = 10bit
R1A2 R1A1 R1A0 R2A2 R2A1 R2A0 WA2 WA0WA1
R1D3 R1D2 R1D1 R1D0 R2D3 R2D2 R2D1 R2D0
WD3WD2
WD1
WD0
WRITE
CLKSYS_CLKREGISTER FILE
IF
ID
D
Q
D D D D D D D D D
Q Q Q Q Q Q Q Q Q
Read_Address_1 Read_Address_2 Write_Address
Read_Data_2
Wri
te_D
ata
Read_Data_1
rs = Source Reg 1 rt = Source Reg 2 rd = Destination Reg
ADD = 1 NOP = 0OpCode2a2a2a2a2a2a2a2a2a2a2a
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 6 / 32 C Copyright 2006 Gandhi Puvvada
opcode rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0
IF/ID
Size = 10bit
R1A2 R1A1 R1A0 R2A2 R2A1 R2A0 WA2 WA0WA1
R1D3 R1D2 R1D1 R1D0 R2D3 R2D2 R2D1 R2D0
WD3
WD2
WD1
WD0
WRITE
CLKSYS_CLKREGISTER FILE
IF
ID
D
Q
D D D D D D D D D
Q Q Q Q Q Q Q Q Q
Read_Address_1 Read_Address_2 Write_Address
Read_Data_2
Wri
te_D
ata
Read_Data_1Mis
s Bru
in’s
Des
ign
1b1b1b1b1b1b1b1b1b1b1b
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 7 / 32 C Copyright 2006 Gandhi Puvvada
opcode rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0
IF/ID
Size = 10bit
R1A2 R1A1 R1A0 R2A2 R2A1 R2A0 WA2 WA0WA1
R1D3 R1D2 R1D1 R1D0 R2D3 R2D2 R2D1 R2D0
WD3WD2
WD1
WD0
WRITE
CLKSYS_CLKREGISTER FILE
A BCo Ci
S
A BCo Ci
S
A BCo Ci
S
A BCo Ci
S
ID/EX1Size =
EX4/WBSize =
EX3/EX4Size =
EX2/EX3Size =
EX1/EX2Size =
EX1
WB
EX2
EX3
EX4
IF
ID
D
Q
D D D D D D D D D
Q Q Q Q Q Q Q Q Q
Read_Address_1 Read_Address_2 Write_Address
Read_Data_2
Wri
te_D
ata
Read_Data_1
rs = Source Reg 1 rt = Source Reg 2 rd = Destination Reg
2b2b2b2b2b2b2b2b2b2b2b
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 8 / 32 C Copyright 2006 Gandhi Puvvada
2. [Based on question 5 of Summer 2003 Midterm and question 8 of Spring 1994 Final] Pipeline Design (Stalling / Flushing / Forwarding):
2.1 Bubbles are produced ________________________________________________________ (in stalling only/in flushing only/in stalling as well as in flushing/in neither stalling nor flushing).
2.2 In the early-branch design of the pipeline CPU (current lab6 based on 3rd ed.), flushing and stalling ___________________ (never occur in the same clock cycle/may sometimes occur in the same clock cycle/always occur in the same clock cycle).
In a late-branch design (based on the first edition), if the branch below is successful, do flushing and stalling both occur together or one would prevent the other? Explain.
beq $1, $2, TARGETlw $4, 40 ($5)or $8, $4, $6
2.3 There are 9 (1+4+2+2) control lines generated by the control unit. Eight of these (8 out of 9) are going from the ID stage to the EX stage. Do you need to convert all the 8 signals to zero when you stall an instruction in the ID stage?
33333333333
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 9 / 32 C Copyright 2006 Gandhi Puvvada
2.4 To ___________ (stall/flush) an instruction in ID stage, you inhibit (prevent) updating of the following register(s). (circle as many of the following as you wish) PC , IF/ID , ID/EX , EX/MEM , MEM/WB You never inhibit (prevent) updating of a stage register if you are currently _______________________________________________________________________ (flushing / stalling / can not fill this blank with either of the previous two choices).
2.5 Late Branch design of the first edition with one HDU in ID stage and one FU in EX stage, and an internally forwarding register file.
All the three streams use the same 3 instructions in different order.
For stream #(each) above, the following occur(s): (circle all correct choices) (i) hazard detection and stalling by HDU (ii) forwarding by FU(iii) internal forwarding in the reg. file (iv) none of these
Now reconsider the above three streams in the context of the early-branch design based on the current lab 6. Explain any differences or striking resemblances to your three answers above.
Stream #1 Stream #2 Stream #3add $3 , $3 , $1; lw $3 , 40($5); lw $3 , 40($5);or $6 , $5 , $4; or $6 , $5 , $4; add $3 , $3 , $1;lw $3 , 40($5); add $3 , $3 , $1; or $6 , $5 , $4;
44444444444
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 10 / 32 C Copyright 2006 Gandhi Puvvada
2.6 In this question we consider the early-branch design of our current lab 6 with two HDUs (HDU and HDU_Br) and two FUs (FU and FU_Br). Of course the register file is an internally forwarding register file. Identify the dependencies in the following instruction streams and how they should be resolved:
For this stream # (each) the following occur(s): (circle all correct choices) (i) HDU_B initiated stalling (ii) HDU initiated stalling(iii) forwarding by FU_B (iv) forwarding by FU(v) internal forwarding in the reg. file (vi) none of these)
Summary: In the lab #6 design for the early-branch,
beq dependent an R-Type
beq dependent on lw
No forwarding at the end of the clock.
Stream #1 Stream #2add $2 , $2 , $2; add $2 , $3 , $4;sub $1 , $2 , $3; sub $5 , $6 , $7;beq $2 , $0 , loop1; beq $5 , $2 , loop1;
Stream #3 Stream #4lw $4 , $3(40); lw $4 , $3(40);beq $4 , $0 , loop1; sub $5 , $6 , $7;
beq $4 , $0 , loop1;
55555555555
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 11 / 32 C Copyright 2006 Gandhi Puvvada
2.7 dependent instruction after a lw instruction.
HDU => STALL
To avoid HDU, Simple minded compiler => NOP
No gain, No loss TRUE or FALSE?
beq in early branch
To avoid flushing hardware, can a compiler put a NOP after every beq ? Feasible?
Performance? No gain, No loss or any gain or loss?
66666666666
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 12 / 32 C Copyright 2006 Gandhi Puvvada
2.8 specific point of tapping of the branch control signal in the ID stage for (a) ANDing with the equality inference and (b) for HDU_Br to produce STALL_BEQ.
0
opco
de Co
ntr
ol(PC
)
EX
MEWB
HDU_Br
STALL_BEQSTALL_LW
STALL
Branch01
Branch
=
A
B
C
Hazarddetection
unit
77777777777
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 13 / 32 C Copyright 2006 Gandhi Puvvada
Mr. Bruin claims that he discovered a problem in this design. He argues that the branch control signal for the AND gate should be taken after the flush mux (Point C) in the design to avoid erroneous branching.
lw $4 , $3(40) ;beq $4 , $0 , loop1 ;
He further offers a solution by moving the tapping of branch control signal from point B to point C instead. Evaluate the proposed solution by answering the following:
It is _______________________________ (a must / a feasible change but does not make any difference / a feasible change that improves the design / a sin) to move the tapping of branch control signal for the AND gate from point B to point C.
It is _______________________________ (a must / a feasible change but does not make any difference / a feasible change that improves the design / a sin) to move the tapping of branch control signal for the HDU_Br from point B to point C.
77777777788
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 14 / 32 C Copyright 2006 Gandhi Puvvada
Another person suggests .... ..... identify BEQ instruction by inspection of a single bit among the six-bit OPCODE field. ..... get branch control signal from point A in the figure.
Is this a good suggestion or bad one?
Notice that in case the first BEQ is taken, the second BEQ should be flushed.
beq $0 , $1 , loop1 ;beq $4 , $2 , loop2 ;
88888888888
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 15 / 32 C Copyright 2006 Gandhi Puvvada
2.9 Forwarding muxes in the EX stage:
FW_R
S_W
B
FW_R
S_M
EM
11
0
0
original read data
forwardedhelp fromWB stage
forwardedhelp fromMEM stage
FW_R
S_M
EM_n
ew
FW_R
S_W
B_n
ew
11
0
0
original read data
forwardedhelp fromMEM stage
forwardedhelp fromWB stage
Original lab design Modified design
88888888888
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 16 / 32 C Copyright 2006 Gandhi Puvvada
add $10, $11, $12 ;
add $3 , $3 , $3 ;or $6 , $3 , $4 ;
In the original design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)In the modified design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)
add $3 , $3 , $3 ;add $10, $11, $12 ;
or $6 , $3 , $4 ;In the original design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)In the modified design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)
add $3 , $3 , $3 ; <====================add $3 , $5 , $2 ; <====================or $6 , $3 , $4 ;
In the original design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)In the modified design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)
From the observations made in above instruction sequences, can we generate the 2 forwarding control signals independent of each other (a) in the original design and (b) in the modified design?
99999999999
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 17 / 32 C Copyright 2006 Gandhi Puvvada
3 Modified Pipeline Design (7-stage pipeline) :
RegInstr.TLB
Instr.cache
DataTLB
Datacache
FU
PC
IF1 IF2 ID EX MEM1 MEM2 WB
Zero
Zero
BRANCH
BR
1
7-stage pipelined version of the late-branch design of the 1st edition
HDU
cont
rol
1010101010101010101010
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 18 / 32 C Copyright 2006 Gandhi Puvvada
RegInstr.TLB
Instr.cache
HDU
DataTLB
Datacache
FU
IF1 IF2 ID EX MEM1 MEM2 WB
BRANCH
BR
17-stage pipelined version of the early-branch design of the 3rd ed. and our lab 6
FU_Br
PCco
ntro
l
HDU_Br
Zero
1010101010101010101010
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 19 / 32 C Copyright 2006 Gandhi Puvvada
RegInstr.
Data
FU
PC
IF ID EX MEM WB
Zero
Zero
BRANCH
BR
15-stage pipeline of the late-branch design of the 1st edition
HDU
contr
ol
RegInstr.
HDU
Data
FU
IF ID EX MEM WB
BRANCH
BR
1
5-stage pipeline of the early-branch design of the 3rd ed. and our lab 6
FU_Br
PC
cont
rol
HDU_Br
Zero
RegInstr.TLB
Instr.cache
DataTLB
Datacache
FU
PC
IF1 IF2 ID EX MEM1 MEM2 WB
Zero
Zero
BRANCH
BR
1
7-stage pipelined version of the late-branch design of the 1st edition
HDU
contr
ol
RegInstr.TLB
Instr.cache
HDU
DataTLB
Datacache
FU
IF1 IF2 ID EX MEM1 MEM2 WB
BRANCH
BR
1
7-stage pipelined version of the early-branch design of the 3rd ed. and our lab 6
FU_Br
PC
cont
rol
HDU_Br
Zero
All
4 pi
pelin
es
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 20 / 32 C Copyright 2006 Gandhi Puvvada
Dependency of a R-type instruction on a load word instruction, stalling by HDU to resolve the dependency problem:
Design item In 5-stage late-branch In 5-stage early-branch In 7-stage late-branch In 7-stage early-branch
i lw $1, 60($2)
i+1 add $4, $1, $6
Any bubbles? How many?
Where are they inserted?Complete the Time-Space diagrams.
This example is completed by us.
Bubbles = ___1______ (0/1/2/3) Bubbles = _____1_____ (0/1/2/3) Bubbles = ____2______ (0/1/2/3) Bubbles = ____2______ (0/1/2/3)
i lw $1, 60($2)
i+1 sub $10, $11, $12
i+2 add $4, $1, $6
Any bubbles? How many?
Where are they inserted?.
Bubbles = ___________ (0/1/2/3) Bubbles = ___________ (0/1/2/3) Bubbles = ___________ (0/1/2/3) Bubbles = ___________ (0/1/2/3)
How many comparators does
the HDU (not HDU_Br)
have? Where do the destina-
tion register addr. inputs to
the comparators come from?
# of comparators = _____Destination reg. addr. input(s) come(s) from:
# of comparators = _____Destination reg. addr. input(s) come(s) from:
# of comparators = _____Destination reg. addr. input(s) come(s) from:
# of comparators = _____Destination reg. addr. input(s) come(s) from:
Delay slots for lw: To avoid the
use of HDU, how delay slots
should we declare for lw?
# of Delay slots = ______ # of Delay slots = ______ # of Delay slots = ______ # of Delay slots = ______
lwadd
add lwadd lw
add
lwadd
Same as 5-stage late-branch lwadd
add
add
add
lw
lw
lw
lwadd
Same as 7-stage late-branch
lwsubadd lwsubadd lwsubadd lwsubadd
Page 11
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 21 / 32 C Copyright 2006 Gandhi Puvvada
Dependency of a R-type instruction on another R-type instruction; Forwarding:
Design item In 5-stage late-branch In 5-stage early-branch In 7-stage late-branch In 7-stage early-branch
i add $5, $7, $9
i+1 xor $1, $2, $3
i+2 or $10, $11, $12
i+3 sub $3, $5, $1
Explain forwarding to
instruction (i+3)
sub receives latest $1 from xor when sub is in ______ stage and xor is in ______ stage under the control of _________________(FU/internal forward-ing in register file).sub receives latest $5 from ________________________________due to ______________________________(FU/internal forward-ing in register file).
sub receives latest $1 from xor first time when sub is in ______ stage and xor is in ______ stage under the control of ________ ________ (FU_Br/FU/internal forwarding in register file). It receives the same value again second time when sub is in ______ stage and xor is in ______ stage under the control of ________ (FU_Br/FU).sub receives latest $5 from ________________________________due to ______________________________(FU_Br/FU/internal forwarding in register file).
sub receives latest $1 from xor when sub is in ______ stage and xor is in ______ stage under the control of _________________(FU/internal forward-ing in register file).sub receives latest $5 from add when sub is in ______ stage and add is in ______ stage under the control of _________________(FU/internal forward-ing in register file).
sub receives latest $1 from xor first time when sub is in ______ stage and xor is in ______ stage under the control of ________ (FU_Br/FU). It receives the same value again second time when sub is in ______ stage and xor is in ______ stage under the control of ________ (FU_Br/FU).sub receives latest $5 from add when sub is in ______ stage and add is in ______ stage under the control of _________________(FU_Br/FU/internal forwarding in register file).
Page 12
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 22 / 32 C Copyright 2006 Gandhi Puvvada
FU_Br, FU details:
Design item In 5-stage late-branch In 5-stage early-branch In 7-stage late-branch In 7-stage early-branch
How many comparators does
the forwarding unit in ID
stage (FU_Br, not FU) have?
How big are the forwarding
muxes (n-bit wide m-to-1
mux)? How many? Where
do the data inputs to the
muxes come from?
# of comparators in FU_Br = __________________Forwarding mux(es) in the A-leg of equality checker (size and num-ber (which is same for the B-leg)) = ______________________________Data inputs for this/these come from ___________________________________________________________________________________________
# of comparators in FU_Br = __________________Forwarding mux(es) in the A-leg of equality checker (size and num-ber (which is same for the B-leg)) = ______________________________Data inputs for this/these come from ___________________________________________________________________________________________
How many comparators does
the forwarding unit in EX
stage (FU, not FU_Br) have?
How big are the forwarding
muxes (n-bit wide m-to-1
mux)? How many? Where
do the data inputs to the
muxes come from?
# of comparators in FU =___________________Forwarding mux(es) in the A-leg of ALU (size and number (which is same for the B-leg)) = ______________________________________Data inputs for this/these come from ___________________________________________________________________________________________
# of comparators in FU =___________________Forwarding mux(es) in the A-leg of ALU (size and number (which is same for the B-leg)) = ______________________________________Data inputs for this/these come from ___________________________________________________________________________________________
Same a
s 5-st
age l
ate-b
ranc
h
T
RUE / FA
LSE
Same a
s 7-st
age l
ate-b
ranc
h
T
RUE / FA
LSE
Not ap
plica
ble
Not ap
plica
ble
Page 13
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 23 / 32 C Copyright 2006 Gandhi Puvvada
Priority in FU and FU_Br:
Design item In 5-stage late-branch In 5-stage early-branch In 7-stage late-branch In 7-stage early-branch
Priority in FU (FU, not
FU_Br): Forwarding to a
dependent instruction stand-
ing in EX stage. Opt to for-
ward from the nearer than
the farther
The FU prefers to accept forwarding help from the __________ (MEM/WB) over ________________(MEM/WB).
The FU prefers to accept forwarding help from the __________ (MEM1/MEM2/WB) over ________________(MEM1/MEM2WB) as well as ______________(MEM1/MEM2WB) Fur-ther ____________________________________________________________________________________________________________________________________________________________.
Priority in FU_Br (FU_Br,
not FU): Forwarding to a
BEQ instruction standing in
ID stage. Opt to forward
from the nearer than the
farther
No priority needs to be implemented in FU_Br.TRUE / FALSEExplain: _____________________________________________________________________________________________________________________________________
The FU_Br prefers to accept forwarding help from a ______________(R-Type/lw) instruction in the _______________ (MEM1/MEM2/WB) over a ______________(R-Type/lw) instruction in the ______________(MEM1/MEM2/WB).
Same as in
the 5-sta
ge
late-branch
TRUE / FALSE
Same as in
the 7-sta
ge
late-branch
TRUE / FALSE
Not applicable
Not applicable
Page 14
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 24 / 32 C Copyright 2006 Gandhi Puvvada
Dependency of a BEQ instruction on a R-type instruction; Stalling through HDU_Br, Forwarding through FU_Br/FU:
Design item In 5-stage late-branch In 5-stage early-branch In 7-stage late-branch In 7-stage early-branch
i beq $2, $4, Target
How many instructions fol-
lowing a successful branch
are flushed?
# of instructions that need to be flushed =
___________________
# of instructions that need to be flushed =
___________________
# of instructions that need to be flushed =
___________________
# of instructions that need to be flushed =
___________________
i add $1, $2, $3
i+1 beq $1, $0, loop
How many clock cycles does
the BEQ have to be stalled?
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from add when beq is in _______ stage and add is in _________ stage under the control of _________________(FU/internal forwarding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from add when beq is in _______ stage and add is in _________ stage under the control of _________________(FU_Br/FU/internal for-warding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from add when beq is in _______ stage and add is in _________ stage under the control of _________________(FU/internal forwarding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from add when beq is in _______ stage and add is in _________ stage under the control of _________________(FU_Br/FU/internal for-warding in register file).
i add $1, $2, $3
i+1 xor $11, $12, $13
i+2 beq $1, $0, loop
How many clock cycles does
the BEQ have to be stalled?
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from add when beq is in _______ stage and add is in _________ stage under the control of _________________(FU/internal forwarding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from add when beq is in _______ stage and add is in _________ stage under the control of _________________(FU_Br/FU/internal for-warding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from add when beq is in _______ stage and add is in _________ stage under the control of _________________(FU/internal forwarding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from add when beq is in _______ stage and add is in _________ stage under the control of _________________(FU_Br/FU/internal for-warding in register file).
Page 15
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 25 / 32 C Copyright 2006 Gandhi Puvvada
Dependency of a BEQ instruction on a lw instruction; Stalling through HDU_Br, Forwarding through FU_Br/FU:
Design item In 5-stage late-branch In 5-stage early-branch In 7-stage late-branch In 7-stage early-branch
i lw $1, $2(40)
i+1 beq $1, $0, loop
How many clock cycles does
the BEQ have to be stalled?
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU/internal forwarding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU_Br/FU/internal for-warding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU/internal forwarding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU_Br/FU/internal for-warding in register file).
i lw $1, $2(40)
i+1 add $6, $5, $4
i+2 beq $1, $0, loop
How many clock cycles does
the BEQ have to be stalled?
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU/internal forwarding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU_Br/FU/internal for-warding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU/internal forwarding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU_Br/FU/internal for-warding in register file).
i lw $1, $2(40)
i+1 add $6, $5, $4
i+2 or $16, $15, $14
i+3 beq $1, $0, loop
How many clock cycles does
the BEQ have to be stalled?
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU/internal forwarding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU_Br/FU/internal for-warding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU/internal forwarding in register file).
# of clock cycles beq needs to be stalled =___________________beq receives latest $1 from lw when beq is in _______ stage and lw is in _________ stage under the control of _________________(FU_Br/FU/internal for-warding in register file).
Page 16
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 26 / 32 C Copyright 2006 Gandhi Puvvada
Miscellaneous:
Design item In 5-stage late-branch In 5-stage early-branch In 7-stage late-branch In 7-stage early-branch
How many comparators does
the HDU_Br) have?
Destination register addr.(s)
come(s) to HDU_Br from .....
# of comparators in HDU_Br = _________Dest. Reg. addr.(s) come(s) from ________________________________________________________
# of comparators in HDU_Br = _________Dest. Reg. addr.(s) come(s) from ________________________________________________________
Though it is not desirable to
“delay” the BEQ execution,
how late in the pipeline can
you execute the BEQ instr. ?
The latest stage for exe-cuting BEQ is ________(EX/MEM/WB).
The latest stage for exe-cuting BEQ is ________(EX/MEM1/MEM2/WB).
The earliest a BEQ can be
executed from is:
The earliest stage for exe-cuting BEQ is ________(IF/ID/EX).
The earliest stage for exe-cuting BEQ is ________(IF1/IF2/ID/EX).
Not applicable
Not applicable
Not applicable
Not applicable
Not applicable
Not applicable
Page 17
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 27 / 32 C Copyright 2006 Gandhi Puvvada
3.2 Flushing of the two instructions in the IF1 and IF2 stages in the case of the 7-stage pipeline:
Note: This part of the design is common to both branch implementations (late or early).
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
7-stage pipeline
PC
cont
rol
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
7-stage pipeline
PC
cont
rol
RESET RESET
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
7-stage pipeline
PC
cont
rol
RESET RESET
Assistant #2’sdesign of flush
Assistant #2’sdesign of flush
1818181818181818181818
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 28 / 32 C Copyright 2006 Gandhi Puvvada
4. Modified Pipeline Design (4-stage pipeline)
EX and MEM =====> EXMEM
No forwarding from EXMEM
If BEQ is dependent on an instruction in EXMEM, it is stalled until the dependency is resolved.So no forwarding into ID stage. No FU_Br.
The HDU is not needed in this design and is removed.
The input connections to the FU and HDU_Br are reduced.
4.1 Complete the forwarding paths to FU.
1818181818181818181818
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 29 / 32 C Copyright 2006 Gandhi Puvvada
04
Inst
ruct
ion
mem
ory
PC
+
r1
r2
R1
R2w
W
opco
ders
rtrd
shift
func
t
Reg
iste
rs
Co
ntr
ol(PC
)
(rs)
(rt)
ALU
rtrd
ALUctrlSign
ext.
EX
MEWB
ALUSrcALUOpRegDst
ALUSrc
Reg
Dst
ALUOp
RegWrite_EX
Dat
am
emor
y
@
W
R
Mem
Rea
d
Mem
Writ
e
IF.Flush
WR
WB
MEM
_dat
aR
EG_d
ata
Reg
Writ
e
MemtoReg
+
=
func
ts_
ext
ShiftLeft 2
Zero
Forwarding Unit
IF/IDIF-Stage
ID/EXMEMID-Stage EXMEM-Stage
EXMEM/WBWB-Stage
rs
Writ
eReg
iste
r_EX
HDU_Br
STALL_BEQ
STALL
Branch
01
0
1
1
1
1
0
00
0
0
1
Branch
1
fowarding_mux_control
2121212121212121212121
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 30 / 32 C Copyright 2006 Gandhi Puvvada
4.2 Compare and contrast the 5-stage pipeline design of lab #6 with this 4-stage pipeline design.
4.2.1 we do not need the regular HDU for LW dependency in the 4-stage pipeline because .....
However, we still need HDU_Br to stall the BEQ instructions.
For this stream # (each), ________ clock cycles is needed for stalling.
Stream #1:lw $4 , $3(40) ;add $10, $4 , $6 ;
Stream #2:lw $4 , $3(40) ;beq $10, $4 , loop1 ;
Stream #3:add $4 , $3, $2 ;beq $10, $4, loop1 ;
1919191919191919191918
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 31 / 32 C Copyright 2006 Gandhi Puvvada
4.2.2 In the 5-stage, In the 4-stage
the PCWrite is under the control of ____________________________ _____________________________________ (HDU/HDU_Br/FU/FU_Br/Successful Branch/Successful Jump/Combination of these/none of these/none, no need to control, activated all the time).
4.2.3 In the 5-stage, In the 4-stage
number and size of comparators in the forwarding unit (FU)
The FU in the case of the 4-stage pipeline produces ____________________ (one/two) outputs, of size __________ (1-bit / each 1-bit / 2-bit / each 2-bit) to control the forwarding muxes.
4.2.4 In the 5-stage, In the 4-stage
The HDU_Br (Hazard Detection Unit assisting beq)
number and size of comparators in the HDU_Br
4.2.5 _______ (Like / Unlike) in the case of the 5-stage pipeline, we ____________ (need / don’t need) prioritization in the 4-stage pipeline in providing forwarding help to the instr #3 in the sequence of adds on the right.
instr #1 add $2, $2, $2instr #2 add $2, $2, $2instr #3 add $2, $2, $2
1919192020202020202020
ee457_Lab6_Part4_r3_for_lecture.fm
10/29/06 32 / 32 C Copyright 2006 Gandhi Puvvada
4.2.6 If the clock frequency is the same for the two pipelines and we ignore the control (branch) hazard, the performance of the 4-stage pipeline is________________________________________ (better than / equal to / worse than / sometimes better than and sometimes worse than) the 5-stage pipeline performance.
4.2.7 In the 4-stage pipeline, since the ALU and the Memory are both in one stage, they can work simultaneously and this merging of ALU with Memory in a single stage does not call for extending the clock period (even if we use the original ALU and Data memory which are NOT fast). TRUE / FALSE
2020202020202020202020