cs 152 computer architecture and engineering lecture 9 ...cs152/fa06/lecnotes/lec5-1.pdf · cs 152...
TRANSCRIPT
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
2006-9-26John Lazzaro
(www.cs.berkeley.edu/~lazzaro)
CS 152 Computer Architecture and Engineering
Lecture 9 – Pipelining III
www-inst.eecs.berkeley.edu/~cs152/
Congrats on Lab 2!
TAs: Udam Saini and Jue Sun
1
UC Regents Fall 2006 © UCBCS 152 L1: The MIPS ISA
Recall: First Lecture ...
All projects successful: We want every group to get every CPU working.
Our goal for Fall 06:
Still on track!
2
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Last time: A Hazard Taxonomy
Structural Hazards
Data Hazards (RAW, WAR, WAW)
Control Hazards (taken branches and jumps)
On each clock cycle, we must detect the presenceof all of these hazards, and resolve them before they break the “contract with the programmer”.
3
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Last Time: Hazard Resolution Toolkit
Stall earlier instructions in pipeline.
Kill earlier instructions in pipeline.
Forward results computed in later pipeline stages to earlier stages.Add new hardware or rearrange hardware design to eliminate hazard.
Make hardware handle concurrent requests to eliminate hazard.
Change ISA to eliminate hazard.
4
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Today: Putting it All Together
Specifications for Lab 3
Preferred hazard resolution tools.
At-risk hazards for Lab 3
Tips for control design
5
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
Lab 3: ISA Specifications
No load “delay slot”
Single “delay slot”
6
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
The level of detail needed for a pipelined design can only be found in this document.
Remember: Online MIPS documentation
42 MIPS32™ Architecture For Programmers Volume II, Revision 2.00
Copyright © 2001-2003 MIPS Technologies Inc. All rights reserved.
AND
Format: AND rd, rs, rt MIPS32
Purpose:
To do a bitwise logical AND
Description: rd ! rs AND rt
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical AND operation. The result is
placed into GPR rd.
Restrictions:
None
Operation:
GPR[rd] ! GPR[rs] and GPR[rt]
Exceptions:
None
31 26 25 21 20 16 15 11 10 6 5 0
SPECIAL
000000rs rt rd
0
00000
AND
100100
6 5 5 5 5 6
And AND
7
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
Hazard Diagnosis
8
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Data Hazards: Read After Write
Read After Write (RAW) hazards.Instruction I2 expects to read a datavalue written by an earlier instruction,but I2 executes “too early” and readsthe wrong copy of the data.
Lab 3 solution: use forwarding heavily, fall back on stalling when forwarding won’twork or slows down the critical path too much.
9
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Mux,Logic
Full bypass network ...
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
Ext
IR IR
B
A
M
32A
L
U
32
32
op
IR
Y
M
IR
Dout
Data Memory
WE
Din
Addr
MemToReg
R
WE, MemToReg
ID (Decode) EX MEM WB
From WB
10
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Mux,Logic
Common bug: Multiple forwards ...
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
Ext
IR IR
B
A
M
32A
L
U
32
32
op
IR
Y
M
IR
Dout
Data Memory
WE
Din
Addr
MemToReg
R
WE, MemToReg
ID (Decode) EX MEM WB
From WB
ADD R4,R3,R2 OR R2,R3,R1 AND R2,R2,R1
Which do we forward from?
11
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Data Hazards: WAR and WAW ...
Write After Read (WAR) hazards. Instruction I2 expects to write over a data value after an earlier instruction I1 reads it. But instead, I2 writes too early, and I1 sees the new value.
Write After Write (WAW) hazards. Instruction I2 writes over data an earlier instruction I1 also writes. But instead, I1 writes after I2, and the final data value is incorrect.
WAR and WAW not possible in our 5-stage pipeline. However, TA test code checks for these, and every semester a few WAR/WAWs are found. Why?
12
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
LW and Hazards
No load “delay slot”
13
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Mux,Logic
Questions about LW and forwarding
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
Ext
IR IR
B
A
M
32A
L
U
32
32
op
IR
Y
M
IR
Dout
Data Memory
WE
Din
Addr
MemToReg
R
WE, MemToReg
ID (Decode) EX MEM WB
From WB
ADDIU R1 R1 24 LW R1 128(R29)
Do we need to stall ?OR R3,R3,R2
14
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Mux,Logic
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
Ext
IR IR
B
A
M
32A
L
U
32
32
op
IR
Y
M
IR
Dout
Data Memory
WE
Din
Addr
MemToReg
R
WE, MemToReg
ID (Decode) EX MEM WB
From WB
ADDIU R1 R1 24 LW R1 128(R29)
Do we need to stall ?OR R1,R3,R1
Questions about LW and forwarding
15
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Resolving a RAW hazard by stalling
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
D
PC
Q
+
0x4
Addr Data
Instr
Mem
Ext
IR IR IR
B
A
M
Instr Fetch
Stage #1 Stage #2 Stage #3
Decode & Reg Fetch
ADD R4,R3,R2OR R5,R4,R2
Let ADD proceed to WB stage, so that R4 is written to regfile.
ADD R4,R3,R2OR R5,R4,R2
Sample programKeep executingOR instructionuntil R4 is ready.Until then, sendNOPS to IR 2/3.
Freeze PC and IR until stall is over.
New datapath hardware
(1) Mux into IR 2/3to feed in NOP.
(2) Write enable on PC and IR 1/2
16
MembersBryant
Michael
Udam
Daniel
Udam’s Group, Spr 05 ...
17
Problem 5• Stalling Logic was incorrect• Solution: Break up signal into sub-wires,
to reduce confusion and facilitate debugging.
18
Problems
Don’t fall into trap of using one giant module– Makes it really hard to find problems– Too many things are happening at the same
time– To solve this, break things apart into sub-
modules and use layers of abstraction.
19
UC Regents Fall 2005 © UCBCS 152 L11: VLSI
Stalling problems ...
° Make sure control signals are synched with corresponding data. Sometimes control signals must be delayed.
° Take extra precautions so that stale data is not reintroduced into your processor pipeline.
SYNCMEISTER Group, FALL 05
20
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
“Synchronous” memory makes it harder ...
21
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Synchronous Memory Reads ...
32Dout
Data Memory32
Addr
Lab 2 Asynchronous Memory
32
Lab 3 Synchronous Memory
22
Problems
The Next PC Calculation– Have to use this when start using synchronous
memory– Very hard to get right when dealing with stalls– Try to get an understanding of the dynamics of
this early on, and don’t start writing Verilog until you do.
– Make designs flexible for stalls early on.
Fall 05: The FOur Bytes ...
23
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
Branches and Hazards
Single “delay slot”
24
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Recall: Control hazard and hardware
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
D
PC
Q
+
0x4
Addr Data
Instr
Mem
Ext
IR IR IR
B
A
M
Instr Fetch
Stage #1 Stage #2 Stage #3
Decode & Reg Fetch
==
To branch control logic
25
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
I1:I2:I3:I4:I5:
t1 t2 t3 t4 t5 t6 t7 t8Time:Inst
I6:
Recall: After more hardware, change ISA
D
PC
Q
+
0x4
Addr Data
Instr
Mem
IR IR
IF (Fetch) ID (Decode) EX (ALU)
IR IR
MEM WB
BEQ R4,R3,25
SUB R1,R9,R8AND R6,R5,R4
I1:I2:I3:
Sample Program(ISA w/o branch delay slot) IF ID
IF
EX MEM WB
If branch is taken, this instruction MUST NOT
complete!
ID stage computes if branch is taken
If we change ISA, can we always let I2 complete (”branch delay slot”) and
eliminate the control hazard.
26
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Mux,Logic
Questions about branch and forwards
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
Ext
IR IR
B
A
M
32A
L
U
32
32
op
IR
Y
M
IR
Dout
Data Memory
WE
Din
Addr
MemToReg
R
WE, MemToReg
ID (Decode) EX MEM WB
BEQ R1 R3 label
Will this work as shown?OR R3,R3,R1
==
To branch control logic
27
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Mux,Logic
Q. Why might this be hard (I) ?
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
Ext
IR IR
B
A
M
32A
L
U
32
32
op
IR
Y
M
IR
Dout
Data Memory
WE
Din
Addr
MemToReg
R
WE, MemToReg
ID (Decode) EX MEM WB
OR R1 R0 R0 l1: BEQ R0 R0 l2OR R0 R0 R0 l2: BEQ R1 R0 l1
A. Delay slot logic.
28
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Lessons learned
Pipelining is hard
Write test code in advance
Study every instruction
Think about interactions ...
29
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
The Break Instruction
30
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
Lab 3: ISA SpecificationsAlso: RESET signal, BREAK release signal, etc ...
31
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Mux,Logic
Q. Why might this be hard (I) ?
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
Ext
IR IR
B
A
M
32A
L
U
32
32
op
IR
Y
M
IR
Dout
Data Memory
WE
Din
Addr
MemToReg
R
WE, MemToReg
ID (Decode) EX MEM WB
BREAK l1: BEQ R0 R0 l2BREAK l2: BEQ R0 R0 l1
A. BREAKand delay slot logic.
32
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Mux,Logic
Why might this be hard (II)?
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
Ext
IR IR
B
A
M
32A
L
U
32
32
op
IR
Y
M
IR
Dout
Data Memory
WE
Din
Addr
MemToReg
R
WE, MemToReg
ID (Decode) EX MEM WB
BREAK BREAK
A. BREAKrelease is tricky.
BREAK
33
Problems
Break– Getting this to work correctly can be tricky– Make sure not to skip the instructions coming
right before or after the break– In simulation, we used a continuous assignment
for the break release, but on the board, this was registered, so we got different results between the two.
34
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
Reset
35
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
From Lab 3 ...
36
Problem 2• Pressing Reset and Step buttons caused
odd errors on the board.• Solution: Put Reset and Release on the
processor clock. When in stepping mode, make Reset and Release the raw signal.
Udam’s Group, Spr 05 ...
37
Reset & PC
PC counter at resetn Especially on the board
Watch the first instructionn Don’t lose or repeat
Behavior under stallingn Different stalls may affect PC differently
38
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
Clocks
39
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
From the Lab ...
Plus, the TFTP clock ...
40
Problems
Handling Clock Boundaries– Make sure to look at how positive edges of
different clocks can interact (ie ButtonParser, SDRAM arbiter, etc.)
– Make sure to use different clocks when doing simulation to try to root out these type of bugs.
41
Problems
Memory Mapped I/O– Difficult to get time correctly– Pay attention to which signal are synchronous
and which are asynchronous– Understand how this module interacts with
other modules in the processor
42
UC Regents Fall 2005 © UCBCS 152 L11: VLSI
Problems: Clocks
° Keep your clocks straight.° Don’t mix clock signals.° You can only have 4 clocks; sometimes putting
on board might unintentionally go over clock limit.
43
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
Control Implementation
44
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Recall: What is single cycle control?
32rd1
RegFile
32rd2
WE32wd
5rs1
5rs2
5ws
ExtRegDest
ALUsrcExtOp
ALUctr
32A
L
U
32
32
op
MemToReg
32Dout
Data Memory
WE32
Din
Addr
MemWr
Equal
RegWr
32Addr Data
InstrMem
Equal
RegDestRegWr
ExtOpALUsrc MemWr
MemToReg
PCSrc
Combinational Logic(Only Gates, No Flip Flops)Just specify logic functions!
45
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
In pipelines, all IR registers are used
IR IR IR IR
ID (Decode) EX MEM WB
Equal
RegDestRegWr
ExtOp MemToReg
PCSrc
Combinational Logic(Only Gates, No Flip Flops)
(add extra state outside!)
A “conceptual” design -- for shortest critical path, IR registers may hold decoded info,
not the complete 32-bit instruction
46
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Two goals when specifying control logic
Bug-free: One “0” that should be a “1” in the control logic function breaks contract with the programmer.
Efficient: Logic function specification should map to hardware with good performance properties: fast, small, low power, etc.
Should be easy for humans to read and understand: sensible signal names, symbolic constants ...
47
UC Regents Fall 2006 © UCBCS 152 L9: Pipelining III
Midterm week begins on Thursday ...HW graded on effort
Midterm (6-9, 306 Soda), no class that day.
Thursday review session.Will cover format, material, and ground rules for test.
48
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
And concurrently, Lab 3 deadlines ...
Lab 2 team evals,Lab 3 design document,Weekend: start design work
49
CS 152 L7: Pipelining I UC Regents Fall 2006 © UCB
Lab 2 Team Evaluations due Thursday
50
CS 152 L9: Pipelining III UC Regents Fall 2006 © UCB
Lab 3 Design Document Details
51
CS 152 L7: Pipelining I UC Regents Fall 2006 © UCBLab 3 design doc, checkoffs, later in week ...
Lab 3 deadlines after the mid-term ...
52