chapter 5 s -...
TRANSCRIPT
DIGITAL
BUILDINGBLO
CKS
Chapter5<1>
DigitalDesignandComputerArchitecture,2ndEdi(on
Chapter5
DavidMoneyHarrisandSarahL.Harris
Chapter5<2>
DIGITAL
BUILDINGBLO
CKS Chapter5::Topics
• Introduc(on
• Arithme(cCircuits
• NumberSystems(no) • Sequen(alBuildingBlocks(later) • MemoryArrays(later) • LogicArrays(maybe)
Chapter5<3>
DIGITAL
BUILDINGBLO
CKS
• Digitalbuildingblocks:– Gates,mulDplexers,decoders,registers,arithmeDccircuits,counters,memoryarrays,logicarrays
• Buildingblocksdemonstratehierarchy,modularity,andregularity:– Hierarchyofsimplercomponents– Well-definedinterfacesandfuncDons– Regularstructureeasilyextendstodifferentsizes
• WillusethesebuildingblocksinChapter7tobuildmicroprocessor
IntroducDon
Chapter5<4>
DIGITAL
BUILDINGBLO
CKS
1-BitAdders
A B0 00 11 01 1
SCout
S =Cout =
HalfAdderA B
S
Cout +
A B0 00 11 01 1
SCout
S =Cout =
FullAdder
Cin
0 00 11 01 1
00001111
A B
S
Cout Cin+
Chapter5<5>
DIGITAL
BUILDINGBLO
CKS
1-BitAdders
A B0 00 11 01 1
0110
SCout0001
S =Cout =
HalfAdderA B
S
Cout +
A B0 00 11 01 1
0110
SCout0001
S =Cout =
FullAdder
Cin
0 00 11 01 1
00001111
1001
0111
A B
S
Cout Cin+
Chapter5<6>
DIGITAL
BUILDINGBLO
CKS
1-BitAdders
A B0 00 11 01 1
0110
SCout0001
S = A ⊕ BCout = AB
HalfAdderA B
S
Cout +
A B0 00 11 01 1
0110
SCout0001
S = A ⊕ B ⊕ CinCout = AB + ACin + BCin
FullAdder
Cin
0 00 11 01 1
00001111
1001
0111
A B
S
Cout Cin+
Chapter5<7>
DIGITAL
BUILDINGBLO
CKS
• Types of carry propagate adders (CPAs):
- Ripple-carry (slow)
- Carry-lookahead (fast)
- Prefix (faster)
• Carry-lookahead and prefix adders faster for large adders but require more hardware
Symbol
MulDbitAdders(CPAs)
A B
S
Cout Cin+N
NN
Chapter5<8>
DIGITAL
BUILDINGBLO
CKS
• Chain 1-bit adders together
• Carry ripples through entire chain
• Disadvantage: slow
Ripple-CarryAdder
S31
A30 B30
S30
A1 B1
S1
A0 B0
S0
C30 C29 C1 C0Cout ++++
A31 B31
Cin
Chapter5<9>
DIGITAL
BUILDINGBLO
CKS
tripple = NtFA where tFA is the delay of a full adder
Ripple-CarryAdderDelay
Chapter5<10>
DIGITAL
BUILDINGBLO
CKS
• Compute carry out (Cout) for k-bit blocks using generate and propagate signals
• Some definitions: - Column i produces a carry out by either generating a carry out or
propagating a carry in to the carry out - Generate (Gi) and propagate (Pi) signals for each column:
• Column i will generate a carry out if Ai AND Bi are both 1.
Gi = Ai Bi • Column i will propagate a carry in to the carry out if Ai OR Bi is 1.
Pi = Ai + Bi • The carry out of column i (Ci) is:
Ci = Ai Bi + (Ai + Bi )Ci-1 = Gi + Pi Ci-1
Carry-LookaheadAdder
Chapter5<11>
DIGITAL
BUILDINGBLO
CKS
• Step 1: Compute Gi and Pi for all columns • Step 2: Compute G and P for k-bit blocks
• Step 3: Cin propagates through each k-bit propagate/generate block
Carry-LookaheadAddiDon
Chapter5<12>
DIGITAL
BUILDINGBLO
CKS
• Example: 4-bit blocks (G3:0 and P3:0) :
G3:0 = G3 + P3 (G2 + P2 (G1 + P1G0 )
P3:0 = P3P2 P1P0
• Generally, Gi:j = Gi + Pi (Gi-1 + Pi-1 (Gi-2 + Pi-2Gj )
Pi:j = PiPi-1 Pi-2Pj
Ci = Gi:j + Pi:j Ci-1
Carry-LookaheadAdder
Chapter5<13>
DIGITAL
BUILDINGBLO
CKS 32-bitCLAwith4-bitBlocks
B0
++++
P3:0
G3P3G2P2G1P1G0
P3P2P1P0
G3:0
Cin
Cout
A0
S0
C0
B1 A1
S1
C1
B2 A2
S2
C2
B3 A3
S3
Cin
A3:0B3:0
S3:0
4-bit CLABlock Cin
A7:4B7:4
S7:4
4-bit CLABlock
C3C7
A27:24B27:24
S27:24
4-bit CLABlock
C23
A31:28B31:28
S31:28
4-bit CLABlock
C27Cout
Chapter5<14>
DIGITAL
BUILDINGBLO
CKS
For N-bit CLA with k-bit blocks: tCLA = tpg + tpg_block + (N/k – 1)tAND_OR + ktFA
- tpg : delay to generate all Pi, Gi - tpg_block : delay to generate all Pi:j, Gi:j - tAND_OR : delay from Cin to Cout of final AND/OR gate in k-bit CLA
block An N-bit carry-lookahead adder is generally much faster than a ripple-carry adder for N > 16
Carry-LookaheadAdderDelay
Chapter5<15>
DIGITAL
BUILDINGBLO
CKS
• Computes carry in (Ci-1) for each column, then
computes sum: Si = (Ai ⊕ Bi) ⊕ Ci
• Computes G and P for 1-, 2-, 4-, 8-bit blocks, etc. until all Gi (carry in) known
• log2N stages
PrefixAdder
Chapter5<16>
DIGITAL
BUILDINGBLO
CKS
• Carry in either generated in a column or propagated from a previous column.
• Column -1 holds Cin, so G-1 = Cin, P-1 = 0
• Carry in to column i = carry out of column i-1: Ci-1 = Gi-1:-1
Gi-1:-1: generate signal spanning columns i-1 to -1 • Sum equation:
Si = (Ai ⊕ Bi) ⊕ Gi-1:-1 • Goal: Quickly compute G0:-1, G1:-1, G2:-1, G3:-1, G4:-1, G5:-1,
… (called prefixes)
PrefixAdder
Chapter5<17>
DIGITAL
BUILDINGBLO
CKS
• Generate and propagate signals for a block spanning bits i:j: Gi:j = Gi:k + Pi:k Gk-1:j
Pi:j = Pi:kPk-1:j • In words:
- Generate: block i:j will generate a carry if: • upper part (i:k) generates a carry or • upper part propagates a carry generated in lower part
(k-1:j) - Propagate: block i:j will propagate a carry if both the
upper and lower parts propagate the carry
PrefixAdder
Chapter5<18>
DIGITAL
BUILDINGBLO
CKS
PrefixAdderSchemaDc
0:-1
-1
2:1
1:-12:-1
012
4:3
3
6:5
5:36:3
456
5:-16:-1 3:-14:-1
8:7
7
10:9
9:710:7
8910
12:11
11
14:13
13:1114:11
121314
13:714:7 11:712:7
9:-110:-1 7:-18:-113:-114:-1 11:-112:-1
15
0123456789101112131415
BiAi
Gi:iPi:i
Gk-1:jPk-1:jGi:kPi:k
Gi:jPi:j
ii:j
BiAiGi-1:-1
Si
iLegend
Chapter5<19>
DIGITAL
BUILDINGBLO
CKS
tPA = tpg + log2N(tpg_prefix ) + tXOR
- tpg: delay to produce Pi Gi (AND or OR gate) - tpg_prefix: delay of black prefix cell (AND-OR gate)
PrefixAdderDelay
Chapter5<20>
DIGITAL
BUILDINGBLO
CKS
Compare delay of: 32-bit ripple-carry, carry-lookahead, and prefix adders • CLA has 4-bit blocks • 2-input gate delay = 100 ps; full adder delay = 300 ps
AdderDelayComparisons
Chapter5<21>
DIGITAL
BUILDINGBLO
CKS
Compare delay of: 32-bit ripple-carry, carry-lookahead, and prefix adders • CLA has 4-bit blocks • 2-input gate delay = 100 ps; full adder delay = 300 ps
tripple = NtFA = 32(300 ps) = 9.6 ns tCLA = tpg + tpg_block + (N/k – 1)tAND_OR + ktFA = [100 + 600 + (7)200 + 4(300)] ps = 3.3 ns tPA = tpg + log2N(tpg_prefix ) + tXOR = [100 + log232(200) + 100] ps = 1.2 ns
AdderDelayComparisons
Chapter5<22>
DIGITAL
BUILDINGBLO
CKS
Subtracter
Symbol Implementation
+
A B
-
YY
A B
NN
N
N N
N
N
Chapter5<23>
DIGITAL
BUILDINGBLO
CKS
Comparator:Equality
Symbol ImplementationA3B3
A2B2A1B1A0B0
Equal=
A B
Equal
44
Chapter5<24>
DIGITAL
BUILDINGBLO
CKS
Copyright©2007Elsevier 5-<24>
Comparator:LessThan
A < B
-
BA
[N-1]
N
N N
Chapter5<25>
DIGITAL
BUILDINGBLO
CKS
Copyright©2007Elsevier 5-<25>
F2:0 Function
000 A & B
001 A | B
010 A + B
011 not used
100 A & ~B
101 A | ~B
110 A - B
111 SLT
ArithmeDcLogicUnit(ALU)
ALU
N N
N3
A B
Y
F
Chapter5<26>
DIGITAL
BUILDINGBLO
CKS
Copyright©2007Elsevier 5-<26>
F2:0 Function
000 A & B
001 A | B
010 A + B
011 not used
100 A & ~B
101 A | ~B
110 A - B
111 SLT
ALUDesign
+
2 01
A B
Cout
Y
3
01
F2
F1:0
[N-1] S
NN
N
N
N NNN
N
2
ZeroExtend
Chapter5<27>
DIGITAL
BUILDINGBLO
CKS
Copyright©2007Elsevier 5-<27>
• Configure 32-bit ALU for SLT operation: A = 25 and B = 32
SetLessThan(SLT)Example
+
2 01
A B
Cout
Y
3
01
F2
F1:0
[N-1] S
NN
N
N
N NNN
N
2
ZeroExtend
Chapter5<28>
DIGITAL
BUILDINGBLO
CKS
Copyright©2007Elsevier 5-<28>
• Configure 32-bit ALU for SLT operation: A = 25 and B = 32 - A < B, so Y should be 32-bit
representation of 1 (0x00000001) - F2:0 = 111
- F2 = 1 (adder acts as subtracter), so 25 - 32 = -7
- -7 has 1 in the most significant bit (S31 = 1)
- F1:0 = 11 multiplexer selects Y = S31 (zero extended) = 0x00000001.
SetLessThan(SLT)Example
+
2 01
A B
Cout
Y
3
01
F2
F1:0
[N-1] S
NN
N
N
N NNN
N
2
ZeroExtend
Chapter5<29>
DIGITAL
BUILDINGBLO
CKS
Copyright©2007Elsevier 5-<29>
• LogicalshiHer:shiasvaluetoleaorrightandfillsemptyspaceswith0’s– Ex:11001>>2=– Ex:11001<<2=
• Arithme(cshiHer:sameaslogicalshiaer,butonrightshia,fillsempty
spaceswiththeoldmostsignificantbit(msb).– Ex:11001>>>2=– Ex:11001<<<2=
• Rotator:rotatesbitsinacircle,suchthatbitsshiaedoffoneendare
shiaedintotheotherend– Ex:11001ROR2=– Ex:11001ROL2=
Shiaers
Chapter5<30>
DIGITAL
BUILDINGBLO
CKS
• LogicalshiHer:– Ex:11001>>2=00110– Ex:11001<<2=00100
• Arithme(cshiHer:– Ex:11001>>>2=11110– Ex:11001<<<2=00100
• Rotator:– Ex:11001ROR2=01110– Ex:11001ROL2=00111
Shiaers
Chapter5<31>
DIGITAL
BUILDINGBLO
CKS
ShiaerDesign
A3:0 Y3:0
shamt1:0
>>
2
4 4
A3 A2 A1 A0
Y3
Y2
Y1
Y0
shamt1:0
00
01
10
11
S1:0
S1:0
S1:0
S1:0
00
01
10
11
00
01
10
11
00
01
10
11
2
Chapter5<32>
DIGITAL
BUILDINGBLO
CKS
• A<<N=A×2N– Example:00001<<2=00100(1×22=4)– Example:11101<<2=10100(-3×22=-12)
• A>>>N=A÷2N– Example:01000>>>2=00010(8÷22=2)– Example:10000>>>2=11100(-16÷22=-4)
ShiaersasMulDpliers,Dividers
Chapter5<33>
DIGITAL
BUILDINGBLO
CKS
• Par(alproductsformedbymulDplyingasingledigitofthemulDplierwithmulDplicand
• ShiHedparDalproductssummedtoformresult
MulDpliers
Decimal Binary230
42x01010111
5 x 7 = 35
460920+9660
01010101
01010000
x
+0100011
230 x 42 = 9660
multipliermultiplicand
partialproducts
result
Chapter5<34>
DIGITAL
BUILDINGBLO
CKS
4x4MulDplier
x B3 B2 B1 B0
A3B0 A2B0 A1B0 A0B0
A3 A2 A1 A0
A3B1 A2B1 A1B1 A0B1
A3B2 A2B2 A1B2 A0B2
A3B3 A2B3 A1B3 A0B3+P7 P6 P5 P4 P3 P2 P1 P0
0
P2
0
0
0
P1 P0P5 P4 P3P7 P6
A3 A2 A1 A0
B0B1
B2
B3
x
A B
P
44
8
Chapter5<35>
DIGITAL
BUILDINGBLO
CKS
4x4Divider
1
A3000
Q3
1
Q2
B0B1B2B3
R0R1R2R3
A2
1
Q1
A1
1
Q0
A0
+
R B
D
R'
N
CinCout
1 0
R B
DR'N
CoutCin
Legend
A/B = Q + R/B Algorithm: R’ = 0 for i = N-1 to 0 R = {R’ << 1. Ai} D = R - B if D < 0, Qi=0, R’=R else Qi=1, R’=D R’=R
DIGITAL
BUILDINGBLO
CKS
Chapter5<36>
DigitalDesignandComputerArchitecture,2ndEdi(on
Chapter7
DavidMoneyHarrisandSarahL.Harris
Chapter5<37>
DIGITAL
BUILDINGBLO
CKS Chapter7::Topics
• Introduc(on
• PerformanceAnalysis
• Single-CycleProcessor
• Mul(cycleProcessor
• PipelinedProcessor
• Excep(ons
• AdvancedMicroarchitecture
Chapter5<38>
DIGITAL
BUILDINGBLO
CKS
• Microarchitecture:howtoimplementanarchitectureinhardware
• Processor:– Datapath:funcDonalblocks– Control:controlsignals
IntroducDon
Physics
Devices
AnalogCircuits
DigitalCircuits
Logic
Micro-architecture
Architecture
OperatingSystems
ApplicationSoftware
electrons
transistorsdiodes
amplifiersfilters
AND gatesNOT gates
addersmemories
datapathscontrollers
instructionsregisters
device drivers
programs
Chapter5<39>
DIGITAL
BUILDINGBLO
CKS
• MulDpleimplementaDonsforasinglearchitecture:– Single-cycle:EachinstrucDonexecutesinasinglecycle
– Mul(cycle:EachinstrucDonisbrokenintoseriesofshortersteps
– Pipelined:EachinstrucDonbrokenupintoseriesofsteps&mulDpleinstrucDonsexecuteatonce
Microarchitecture
RESUME
Chapter5<40>
DIGITAL
BUILDINGBLO
CKS
• ProgramexecuDonDmeExecu(onTime=(#instruc(ons)(cycles/instruc(on)(seconds/
cycle)• DefiniDons:
– CPI:Cycles/instrucDon– clockperiod:seconds/cycle– IPC:instrucDons/cycle=IPC
• ChallengeistosaDsfyconstraintsof:– Cost– Power– Performance
ProcessorPerformance
Chapter5<41>
DIGITAL
BUILDINGBLO
CKS
• ConsidersubsetofMIPSinstrucDons:– R-typeinstrucDons:and,or,add,sub,slt– MemoryinstrucDons:lw,sw– BranchinstrucDons:beq
MIPSProcessor
Chapter5<42>
DIGITAL
BUILDINGBLO
CKS
• Determineseverythingaboutaprocessor:– PC– 32registers(andafewextraoneswe’llignorehere)
– Memory
ArchitecturalState
Chapter5<43>
DIGITAL
BUILDINGBLO
CKS MIPSStateElements
CLK
A RDInstructionMemory
A1
A3WD3
RD2RD1
WE3
A2
CLK
RegisterFile
A RDData
MemoryWD
WEPCPC'
CLK
32 3232 32
32
32
32 32
32
32
5
5
5
Chapter5<44>
DIGITAL
BUILDINGBLO
CKS
• Datapath• Control
Single-CycleMIPSProcessor
Chapter5<45>
DIGITAL
BUILDINGBLO
CKS
STEP1:FetchinstrucDon
Single-CycleDatapath:lwfetch
CLK
A RDInstructionMemory
A1
A3WD3
RD2
RD1WE3
A2
CLK
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr
CLK
Chapter5<46>
DIGITAL
BUILDINGBLO
CKS
STEP2:ReadsourceoperandsfromRF
Single-CycleDatapath:lwRegisterRead
Instr
CLK
A RDInstructionMemory
A1
A3WD3
RD2
RD1WE3
A2
CLK
RegisterFile
A RDData
MemoryWD
WEPCPC'
25:21
CLK
Chapter5<47>
DIGITAL
BUILDINGBLO
CKS
STEP3:Sign-extendtheimmediate
Single-CycleDatapath:lwImmediate
SignImm
CLK
A RDInstruction
Memory
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
CLK
Chapter5<48>
DIGITAL
BUILDINGBLO
CKS
STEP4:Computethememoryaddress
Single-CycleDatapath:lwaddress
SignImm
CLK
A RDInstruction
Memory
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
SrcB
ALUResult
SrcA Zero
CLK
ALUControl2:0
ALU
010
Chapter5<49>
DIGITAL
BUILDINGBLO
CKS
• STEP5:Readdatafrommemoryandwriteitbacktoregisterfile
Single-CycleDatapath:lwMemoryRead
A1
A3WD3
RD2
RD1WE3
A2
SignImm
CLK
A RDInstruction
Memory
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
SrcB20:16
ALUResult ReadData
SrcA
RegWrite
Zero
CLK
ALUControl2:0
ALU
0101
Chapter5<50>
DIGITAL
BUILDINGBLO
CKS
STEP6:DetermineaddressofnextinstrucDon
Single-CycleDatapath:lwPCIncrement
RESEUME
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
SrcB20:16
ALUResult ReadData
SrcA
PCPlus4
Result
RegWrite
Zero
CLK
ALUControl2:0
ALU
0101
Chapter5<51>
DIGITAL
BUILDINGBLO
CKS
Writedatainrttomemory
Single-CycleDatapath:sw
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
20:16
15:0
SrcB20:16
ALUResult ReadData
WriteData
SrcA
PCPlus4
Result
MemWriteRegWrite
Zero
CLK
ALUControl2:0
ALU
10100
Chapter5<52>
DIGITAL
BUILDINGBLO
CKS
• Readfromrsandrt• WriteALUResulttoregisterfile• Writetord(insteadofrt)
Single-CycleDatapath:R-Type
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PCPC' Instr 25:21
20:16
15:0
SrcB
20:16
15:11
ALUResult ReadData
WriteData
SrcA
PCPlus4WriteReg4:0
Result
RegDst MemWrite MemtoRegALUSrcRegWrite
Zero
CLK
ALUControl2:0
ALU
0varies1 001
Chapter5<53>
DIGITAL
BUILDINGBLO
CKS
• Determinewhethervaluesinrsandrtareequal• Calculatebranchtargetaddress:BTA=(sign-extendedimmediate<<2)+(PC+4)
Single-CycleDatapath:beq
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
RegDst Branch MemWrite MemtoRegALUSrcRegWrite
Zero
PCSrc
CLK
ALUControl2:0
ALU
01100 x0x 1
Chapter5<54>
DIGITAL
BUILDINGBLO
CKS Single-CycleProcessor
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
Chapter5<55>
DIGITAL
BUILDINGBLO
CKS Single-CycleControl
RegDst
BranchMemWriteMemtoReg
ALUSrcOpcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainDecoder
ALUOp1:0
ALUDecoder
RegWrite
Chapter5<56>
DIGITAL
BUILDINGBLO
CKS
F2:0 Function
000 A & B
001 A | B
010 A + B
011 not used
100 A & ~B
101 A | ~B
110 A - B
111 SLT
Review:ALU
ALU
N N
N3
A B
Y
F
Chapter5<57>
DIGITAL
BUILDINGBLO
CKS Review:ALU
+
2 01
A B
Cout
Y
3
01
F2
F1:0
[N-1] S
NN
N
N
N NNN
N
2Zero
Extend
Chapter5<58>
DIGITAL
BUILDINGBLO
CKS
ALUOp1:0 Meaning
00 Add
01 Subtract 10 Look at Funct 11 Not Used
ALUOp1:0 Funct ALUControl2:0
00 X 010 (Add) X1 X 110 (Subtract) 1X 100000 (add) 010 (Add) 1X 100010 (sub) 110 (Subtract) 1X 100100 (and) 000 (And) 1X 100101 (or) 001 (Or) 1X 101010 (slt) 111 (SLT)
ControlUnit:ALUDecoder
Chapter5<59>
DIGITAL
BUILDINGBLO
CKS
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000
lw 100011
sw 101011
beq 000100
ControlUnitMainDecoder
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
Chapter5<60>
DIGITAL
BUILDINGBLO
CKS
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000 1 1 0 0 0 0 10
lw 100011 1 0 1 0 0 0 00
sw 101011 0 X 1 0 1 X 00
beq 000100 0 X 0 1 0 X 01
ControlUnit:MainDecoder
Chapter5<61>
DIGITAL
BUILDINGBLO
CKS Single-CycleDatapath:or
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
0010
01
0
0
1
0
Chapter5<62>
DIGITAL
BUILDINGBLO
CKS
No change to datapath
ExtendedFuncDonality:addi
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
Chapter5<63>
DIGITAL
BUILDINGBLO
CKS
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000 1 1 0 0 0 0 10
lw 100011 1 0 1 0 0 1 00
sw 101011 0 X 1 0 1 X 00
beq 000100 0 X 0 1 0 X 01
addi 001000
ControlUnit:addi
Chapter5<64>
DIGITAL
BUILDINGBLO
CKS
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000 1 1 0 0 0 0 10
lw 100011 1 0 1 0 0 1 00
sw 101011 0 X 1 0 1 X 00
beq 000100 0 X 0 1 0 X 01
addi 001000 1 0 1 0 0 0 00
ControlUnit:addi
Chapter5<65>
DIGITAL
BUILDINGBLO
CKS
ExtendedFuncDonality:j
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
01
25:0 <<2
27:0 31:28
PCJump
Jump
Chapter5<66>
DIGITAL
BUILDINGBLO
CKS
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump
R-type 000000 1 1 0 0 0 0 10 0
lw 100011 1 0 1 0 0 1 00 0
sw 101011 0 X 1 0 1 X 00 0
beq 000100 0 X 0 1 0 X 01 0
j 000100
ControlUnit:MainDecoder
Chapter5<67>
DIGITAL
BUILDINGBLO
CKS
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump
R-type 000000 1 1 0 0 0 0 10 0
lw 100011 1 0 1 0 0 1 00 0
sw 101011 0 X 1 0 1 X 00 0
beq 000100 0 X 0 1 0 X 01 0
j 000100 0 X X X 0 X XX 1
ControlUnit:MainDecoder
Chapter5<68>
DIGITAL
BUILDINGBLO
CKS
ProgramExecu(onTime=(#instruc(ons)(cycles/instruc(on)(seconds/cycle)=#instruc(onsxCPIxTC
Review:ProcessorPerformance
Chapter5<69>
DIGITAL
BUILDINGBLO
CKS
TC limited by critical path (lw)
Single-CyclePerformance
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU1
0100
1
0
1
0 0
Chapter5<70>
DIGITAL
BUILDINGBLO
CKS
• Single-cycle critical path: Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU +
tmem + tmux + tRFsetup
• Typically, limiting paths are: - memory, ALU, register file - Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup
Single-CyclePerformance
Chapter5<71>
DIGITAL
BUILDINGBLO
CKS
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Tc = ?
Single-CyclePerformanceExample
Chapter5<72>
DIGITAL
BUILDINGBLO
CKS
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup = [30 + 2(250) + 150 + 25 + 200 + 20] ps = 925 ps
Single-CyclePerformanceExample
Chapter5<73>
DIGITAL
BUILDINGBLO
CKS
Program with 100 billion instructions:
Execution Time = # instructions x CPI x TC
= (100 × 109)(1)(925 × 10-12 s)
= 92.5 seconds
Single-CyclePerformanceExample
Chapter5<74>
DIGITAL
BUILDINGBLO
CKS
• Single-cycle:+simple
- cycleDmelimitedbylongestinstrucDon(lw)- 2adders/ALUs&2memories
• Mul(cycle:+higherclockspeed+simplerinstrucDonsrunfaster+reuseexpensivehardwareonmulDplecycles-sequencingoverheadpaidmanyDmes• Samedesignsteps:datapath&control
MulDcycleMIPSProcessor
Chapter5<75>
DIGITAL
BUILDINGBLO
CKS
• Replace Instruction and Data memories with a single unified memory – more realistic
MulDcycleStateElements
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
RegisterFile
PCPC'
WD
WE
CLK
EN
Chapter5<76>
DIGITAL
BUILDINGBLO
CKS
STEP 1: Fetch instruction
MulDcycleDatapath:InstrucDonFetch
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
RegisterFile
PCPC' Instr
CLK
WD
WE
CLK
EN
IRWrite
Chapter5<77>
DIGITAL
BUILDINGBLO
CKS
MulDcycleDatapath:lwRegisterRead
STEP2a:ReadsourceoperandsfromRF
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
RegisterFile
PCPC' Instr 25:21
CLK
WD
WE
CLK CLK
A
EN
IRWrite
Chapter5<78>
DIGITAL
BUILDINGBLO
CKS
MulDcycleDatapath:lwImmediate
STEP2b:Sign-extendtheimmediate
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr 25:21
15:0
CLK
WD
WE
CLK CLK
A
EN
IRWrite
Chapter5<79>
DIGITAL
BUILDINGBLO
CKS
MulDcycleDatapath:lwAddress
STEP3:Computethememoryaddress
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr 25:21
15:0
SrcB
ALUResult
SrcA
ALUOut
CLK
ALUControl2:0
ALU
WD
WE
CLK CLK
A CLK
EN
IRWrite
Chapter5<80>
DIGITAL
BUILDINGBLO
CKS
MulDcycleDatapath:lwMemoryRead
STEP4:Readdatafrommemory
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr 25:21
15:0
SrcB
ALUResult
SrcA
ALUOut
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A CLK
EN
IRWriteIorD
01
Chapter5<81>
DIGITAL
BUILDINGBLO
CKS
MulDcycleDatapath:lwWriteRegister
STEP5:Writedatabacktoregisterfile
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr 25:21
15:0
SrcB20:16
ALUResult
SrcA
ALUOut
RegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A CLK
EN
IRWriteIorD
01
Chapter5<82>
DIGITAL
BUILDINGBLO
CKS
MulDcycleDatapath:IncrementPC
STEP6:IncrementPC
PCWrite
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01PCPC' Instr 25:21
15:0
SrcB20:16
ALUResult
SrcA
ALUOut
ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A
00011011
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD
01
Chapter5<83>
DIGITAL
BUILDINGBLO
CKS
Write data in rt to memory
MulDcycleDatapath:sw
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01PC 0
1
PC' Instr 25:21
20:16
15:0
SrcB20:16
ALUResult
SrcA
ALUOut
MemWrite ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A
00011011
4
CLK
ENEN
ALUSrcB1:0IRWriteIorDPCWrite
B
Chapter5<84>
DIGITAL
BUILDINGBLO
CKS
• Read from rs and rt
• Write ALUResult to register file
• Write to rd (instead of rt)
MulDcycleDatapath:R-Type
01
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01PC 0
1
PC' Instr 25:21
20:16
15:0
SrcB20:16
15:11
ALUResult
SrcA
ALUOut
RegDstMemWrite MemtoReg ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWriteIorDPCWrite
Chapter5<85>
DIGITAL
BUILDINGBLO
CKS
• rs == rt?
• BTA = (sign-extended immediate << 2) + (PC+4)
MulDcycleDatapath:beq
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
RegDst BranchMemWrite MemtoReg ALUSrcARegWrite
Zero
PCSrc
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD PCWrite
PCEn
Chapter5<86>
DIGITAL
BUILDINGBLO
CKS
MulDcycleProcessor
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
Chapter5<87>
DIGITAL
BUILDINGBLO
CKS
MulDcycleControl
ALUSrcA
PCSrc
Branch
ALUSrcB1:0Opcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainController(FSM)
ALUOp1:0
ALUDecoder
RegWrite
PCWrite
IorD
MemWriteIRWrite
RegDstMemtoReg
RegisterEnables
MultiplexerSelects
Chapter5<88>
DIGITAL
BUILDINGBLO
CKS MainControllerFSM:Fetch
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
0
1 1
0
X
X
00
01
0100
10
Reset
S0: Fetch
Chapter5<89>
DIGITAL
BUILDINGBLO
CKS MainControllerFSM:Fetch
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
0
1 1
0
X
X
00
01
0100
10
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
Reset
S0: Fetch
Chapter5<90>
DIGITAL
BUILDINGBLO
CKS MainControllerFSM:Decode
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
Reset
S0: Fetch S1: Decode
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
X
0 0
0
X
X
0X
XX
XXXX
00
Chapter5<91>
DIGITAL
BUILDINGBLO
CKS MainControllerFSM:Address
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
Reset
S0: Fetch
S2: MemAdr
S1: Decode
Op = LWor
Op = SW
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
X
0 0
0
X
X
01
10
010X
00
Chapter5<92>
DIGITAL
BUILDINGBLO
CKS MainControllerFSM:Address
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
Reset
S0: Fetch
S2: MemAdr
S1: Decode
Op = LWor
Op = SW
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
X
0 0
0
X
X
01
10
010X
00
Chapter5<93>
DIGITAL
BUILDINGBLO
CKS MainControllerFSM:lw
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemRead
Op = LWor
Op = SW
Op = LW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
Chapter5<94>
DIGITAL
BUILDINGBLO
CKS MainControllerFSM:sw
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1 IorD = 1MemWrite
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
Op = LWor
Op = SW
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
Chapter5<95>
DIGITAL
BUILDINGBLO
CKS MainControllerFSM:R-Type
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
Op = LWor
Op = SWOp = R-type
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
Chapter5<96>
DIGITAL
BUILDINGBLO
CKS MainControllerFSM:beq
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
Chapter5<97>
DIGITAL
BUILDINGBLO
CKS
MulDcycleControllerFSM
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
Chapter5<98>
DIGITAL
BUILDINGBLO
CKS
ExtendedFuncDonality:addi
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
Chapter5<99>
DIGITAL
BUILDINGBLO
CKS
MainControllerFSM:addi
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst = 0MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
Chapter5<100>
DIGITAL
BUILDINGBLO
CKS
ExtendedFuncDonality:j
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01PC 0
1
PC' Instr 25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
RegDst BranchMemWrite MemtoReg ALUSrcARegWrite
Zero
PCSrc1:0
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD PCWrite
PCEn
000110
<<2
25:0 (jump)
31:28
27:0
PCJump
Chapter5<101>
DIGITAL
BUILDINGBLO
CKS
MainControllerFSM:j
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 00
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst = 0MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
Op = J
S11: Jump
Chapter5<102>
DIGITAL
BUILDINGBLO
CKS
MainControllerFSM:j
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 00
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst = 0MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
PCSrc = 10PCWrite
Op = J
S11: Jump
Chapter5<103>
DIGITAL
BUILDINGBLO
CKS
• Instructions take different number of cycles: - 3 cycles: beq, j - 4 cycles: R-Type, sw, addi - 5 cycles: lw
• CPI is weighted average
• SPECINT2000 benchmark: - 25% loads - 10% stores - 11% branches - 2% jumps - 52% R-type
Average CPI = (0.11 + 0.02)(3) + (0.52 + 0.10)(4) + (0.25)(5) = 4.12
MulDcycleProcessorPerformance
Chapter5<104>
DIGITAL
BUILDINGBLO
CKS
Multicycle critical path:
Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup
MulDcycleProcessorPerformance
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
Chapter5<105>
DIGITAL
BUILDINGBLO
CKS
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Tc = ?
MulDcyclePerformanceExample
Chapter5<106>
DIGITAL
BUILDINGBLO
CKS
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Tc = tpcq_PC + tmux + max(tALU + tmux, tmem) + tsetup = tpcq_PC + tmux + tmem + tsetup = [30 + 25 + 250 + 20] ps = 325 ps
MulDcyclePerformanceExample
Chapter5<107>
DIGITAL
BUILDINGBLO
CKS
Program with 100 billion instructions
Execution Time = (# instructions) × CPI × Tc
= (100 × 109)(4.12)(325 × 10-12) = 133.9 seconds
This is slower than the single-cycle processor (92.5 seconds). Why? - Not all steps same length - Sequencing overhead for each step (tpcq + tsetup= 50 ps)
MulDcyclePerformanceExample
Chapter5<108>
DIGITAL
BUILDINGBLO
CKS
Review:Single-CycleProcessor
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01 PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
01
25:0 <<2
27:0 31:28
PCJump
Jump
Chapter5<109>
DIGITAL
BUILDINGBLO
CKS
Review:MulDcycleProcessor
ImmExt
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01PC 0
1
PC' Instr 25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
ZeroCLK
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN000110
<<2
25:0 (Addr)
31:28
27:0
PCJump
5:0
31:26
Branch
MemWrite
ALUSrcARegWrite
OpFunct
ControlUnit
PCSrc
CLK
ALUControl2:0
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
RegD
st
Mem
toReg
Chapter5<110>
DIGITAL
BUILDINGBLO
CKS
• Temporal parallelism
• Divide single-cycle processor into 5 stages: - Fetch - Decode - Execute - Memory - Writeback
• Add pipeline registers between stages
PipelinedMIPSProcessor
Chapter5<111>
DIGITAL
BUILDINGBLO
CKS
Single-Cyclevs.Pipelined
RESUME
Time (ps)Instr
FetchInstruction
DecodeRead Reg
ExecuteALU
MemoryRead / Write
WriteReg1
2
0 100 200 300 400 500 600 700 800 900 1100 1200 1300 1400 1500 1600 1700 1800 19001000
Instr
1
2
3
FetchInstruction
DecodeRead Reg
ExecuteALU
MemoryRead / Write
WriteReg
FetchInstruction
DecodeRead Reg
ExecuteALU
MemoryRead/Write
WriteReg
FetchInstruction
DecodeRead Reg
ExecuteALU
MemoryRead/Write
WriteReg
FetchInstruction
DecodeRead Reg
ExecuteALU
MemoryRead/Write
WriteReg
Single-Cycle
Pipelined
Chapter5<112>
DIGITAL
BUILDINGBLO
CKS
PipelinedProcessorAbstracDon
Time (cycles)
lw $s2, 40($0) RF 40
$0RF
$s2+ DM
RF $t2
$t1RF
$s3+ DM
RF $s5
$s1RF
$s4- DM
RF $t6
$t5RF
$s5& DM
RF 20
$s1RF
$s6+ DM
RF $t4
$t3RF
$s7| DM
add $s3, $t1, $t2
sub $s4, $s1, $s5
and $s5, $t5, $t6
sw $s6, 20($s1)
or $s7, $t3, $t4
1 2 3 4 5 6 7 8 9 10
add
IM
IM
IM
IM
IM
IM lw
sub
and
sw
or
Chapter5<113>
DIGITAL
BUILDINGBLO
CKS
Single-Cycle&PipelinedDatapath
SignImmE
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PCF01
PC' InstrD 25:21
20:16
15:0
SrcBE
20:16
15:11
RtE
RdE
<<2+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
ResultW
PCPlus4EPCPlus4F
ZeroM
CLK CLK
ALU
WriteRegE4:0
CLKCLK
CLK
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
Zero
CLK
ALU
Fetch Decode Execute Memory Writeback
Chapter5<114>
DIGITAL
BUILDINGBLO
CKS
WriteReg must arrive at same time as Result
CorrectedPipelinedDatapath
SignImmE
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PCF01
PC' InstrD 25:21
20:16
15:0
SrcBE
20:16
15:11
RtE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
WriteRegM4:0
ResultW
PCPlus4EPCPlus4F
ZeroM
CLK CLK
WriteRegW4:0
ALU
WriteRegE4:0
CLKCLK
CLK
Fetch Decode Execute Memory Writeback
Chapter5<115>
DIGITAL
BUILDINGBLO
CKS
• Samecontrolunitassingle-cycleprocessor
• Controldelayedtoproperpipelinestage
PipelinedProcessorControl
SignImmE
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PCF01
PC' InstrD 25:21
20:16
15:0
5:0
SrcBE
20:16
15:11
RtE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
WriteRegM4:0
ResultW
PCPlus4EPCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
ZeroM
PCSrcM
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
ALU
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
BranchE BranchM
RegDstE
ALUSrcE
WriteRegE4:0
Chapter5<116>
DIGITAL
BUILDINGBLO
CKS
• WhenaninstrucDondependsonresultfrominstrucDonthathasn’tcompleted
• Types:– Datahazard:registervaluenotyetwrikenbacktoregisterfile
– Controlhazard:nextinstrucDonnotdecidedyet(causedbybranches)
PipelineHazards
Chapter5<117>
DIGITAL
BUILDINGBLO
CKS
DataHazard
Time (cycles)
add $s0, $s2, $s3 RF $s3
$s2RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IM add
or
sub
Chapter5<118>
DIGITAL
BUILDINGBLO
CKS
• InsertnopsincodeatcompileDme• RearrangecodeatcompileDme• ForwarddataatrunDme• StalltheprocessoratrunDme
HandlingDataHazards
Chapter5<119>
DIGITAL
BUILDINGBLO
CKS
• Insert enough nops for result to be ready
• Or move independent useful instructions forward
Compile-TimeHazardEliminaDon
Time (cycles)
add $s0, $s2, $s3 RF $s3
$s2RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IM add
or
sub
nop
nop
RF RFDMnopIM
RF RFDMnopIM
9 10
Chapter5<120>
DIGITAL
BUILDINGBLO
CKS
DataForwarding
Time (cycles)
add $s0, $s2, $s3 RF $s3
$s2RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IM add
or
sub
Chapter5<121>
DIGITAL
BUILDINGBLO
CKS
DataForwarding
SignImmE
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
SignExtend
RegisterFile
01
01
A RDData
MemoryWD
WE
10
PCF01
PC' InstrD 25:21
20:16
15:0
5:0
SrcBE
25:21
15:11
RsE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
WriteRegM4:0
ResultW
PCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD2:0
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
PCSrcM
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
ALU
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
RegDstE
ALUSrcE
WriteRegE4:0
000110
000110
SignImmD
ForwardAE
ForwardBE
20:16 RtE
RsD
RdD
RtD
RegWriteM
RegWriteW
Hazard Unit
PCPlus4E
BranchE BranchM
ZeroM
Chapter5<122>
DIGITAL
BUILDINGBLO
CKS
• ForwardtoExecutestagefromeither:– Memorystageor– Writebackstage
• ForwardinglogicforForwardAE:if ((rsE != 0) AND (rsE == WriteRegM) AND RegWriteM) then ForwardAE = 10 else if ((rsE != 0) AND (rsE == WriteRegW) AND RegWriteW) then ForwardAE = 01 else ForwardAE = 00ForwardinglogicforForwardBEsame,butreplacersEwithrtE
DataForwarding
Chapter5<123>
DIGITAL
BUILDINGBLO
CKS
Stalling
Time (cycles)
lw $s0, 40($0) RF 40
$0RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IM lw
or
sub
Trouble!
Chapter5<124>
DIGITAL
BUILDINGBLO
CKS
Stalling
Time (cycles)
lw $s0, 40($0) RF 40
$0RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IM lw
or
sub
9
RF $s1
$s0
IM or
Stall
Chapter5<125>
DIGITAL
BUILDINGBLO
CKS
StallingHardware
SignImmE
CLK
A RDInstruction
Memory+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
SignExtend
RegisterFile
01
01
A RDData
MemoryWD
WE
10
PCF01
PC' InstrD 25:21
20:16
15:0
5:0
SrcBE
25:21
15:11
RsE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
WriteRegM4:0
ResultW
PCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD2:0
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
PCSrcM
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
ALU
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
RegDstE
ALUSrcE
WriteRegE4:0
000110
000110
SignImmD
StallF
StallD
ForwardAE
ForwardBE
20:16 RtE
RsD
RdD
RtD
RegWriteM
RegWriteW
MemtoRegE
Hazard Unit
FlushE
PCPlus4E
BranchE BranchM
ZeroM
EN
EN CLR
Chapter5<126>
DIGITAL
BUILDINGBLO
CKS
lwstall = ((rsD==rtE) OR (rtD==rtE)) AND MemtoRegE StallF = StallD = FlushE = lwstall
StallingLogic
Chapter5<127>
DIGITAL
BUILDINGBLO
CKS
• beq:– branchnotdeterminedunDl4thstageofpipeline– InstrucDonsaaerbranchfetchedbeforebranchoccurs
– TheseinstrucDonsmustbeflushedifbranchhappens• Branchmispredic(onpenalty
– numberofinstrucDonflushedwhenbranchistaken– Maybereducedbydeterminingbranchearlier
ControlHazards
Chapter5<128>
DIGITAL
BUILDINGBLO
CKS
ControlHazards:OriginalPipeline
SignImmE
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
SignExtend
RegisterFile
01
01
A RDData
MemoryWD
WE
10
PCF01
PC' InstrD 25:21
20:16
15:0
5:0
SrcBE
25:21
15:11
RsE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
WriteRegM4:0
ResultW
PCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD2:0
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
PCSrcM
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
ALU
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
RegDstE
ALUSrcE
WriteRegE4:0
000110
000110
SignImmD
StallF
StallD
ForwardAE
ForwardBE
20:16 RtE
RsD
RdD
RtD
RegWriteM
RegWriteW
MemtoRegE
Hazard Unit
FlushE
PCPlus4E
BranchE BranchM
ZeroM
EN
EN
CLR
Chapter5<129>
DIGITAL
BUILDINGBLO
CKS
ControlHazards
Time (cycles)
beq $t1, $t2, 40 RF $t2
$t1RF- DM
RF $s1
$s0RF& DM
RF $s0
$s4RF| DM
RF $s5
$s0RF- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IM lw
or
sub
20
24
28
2C
30
...
...
9
Flushthese
instructions
64 slt $t3, $s2, $s3 RF $s3
$s2RF
$t3slt DMIM slt
Chapter5<130>
DIGITAL
BUILDINGBLO
CKS
IntroducedanotherdatahazardinDecodestage
EarlyBranchResoluDon
EqualD
SignImmE
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
SignExtend
RegisterFile
01
01
A RDData
MemoryWD
WE
10
PCF01
PC' InstrD 25:21
20:16
15:0
5:0
SrcBE
25:21
15:11
RsE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchD
WriteRegM4:0
ResultW
PCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD2:0
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
PCSrcD
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
ALU
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
RegDstE
ALUSrcE
WriteRegE4:0
000110
000110
=
SignImmD
StallF
StallD
ForwardAE
ForwardBE
20:16 RtE
RsD
RdE
RtD
RegWriteM
RegWriteW
MemtoRegE
Hazard Unit
FlushE
EN
EN
CLR
CLR
Chapter5<131>
DIGITAL
BUILDINGBLO
CKS
EarlyBranchResoluDon
Time (cycles)
beq $t1, $t2, 40 RF $t2
$t1RF- DM
RF $s1
$s0RF& DMand $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
andIM
IM lw20
24
28
2C
30
...
...
9
Flushthis
instruction
64 slt $t3, $s2, $s3 RF $s3
$s2RF
$t3slt DMIM slt
Chapter5<132>
DIGITAL
BUILDINGBLO
CKS
HandlingData&ControlHazards
EqualD
SignImmE
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
SignExtend
RegisterFile
01
01
A RDData
MemoryWD
WE
10
PCF01
PC' InstrD 25:21
20:16
15:0
5:0
SrcBE
25:21
15:11
RsE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchD
WriteRegM4:0
ResultW
PCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD2:0
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
PCSrcD
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
ALU
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
RegDstE
ALUSrcE
WriteRegE4:0
000110
000110
0101
=
SignImmD
StallF
StallD
ForwardAE
ForwardBE
ForwardAD
ForwardBD
20:16 RtE
RsD
RdD
RtD
RegWriteE
RegWriteM
RegWriteW
MemtoRegE
BranchD
Hazard Unit
FlushE
EN
EN
CLR
CLR
Chapter5<133>
DIGITAL
BUILDINGBLO
CKS
• Forwardinglogic: ForwardAD = (rsD !=0) AND (rsD == WriteRegM) AND RegWriteM ForwardBD = (rtD !=0) AND (rtD == WriteRegM) AND RegWriteM
• Stallinglogic:
branchstall = BranchD AND RegWriteE AND (WriteRegE == rsD OR WriteRegE == rtD) OR BranchD AND MemtoRegM AND
(WriteRegM == rsD OR WriteRegM == rtD) StallF = StallD = FlushE = lwstall OR branchstall
ControlForwarding&StallingLogic
Chapter5<134>
DIGITAL
BUILDINGBLO
CKS
• Guesswhetherbranchwillbetaken– Backwardbranchesareusuallytaken(loops)– Considerhistorytoimproveguess
• GoodpredicDonreducesfracDonofbranchesrequiringaflush
BranchPredicDon
Chapter5<135>
DIGITAL
BUILDINGBLO
CKS
• SPECINT2000 benchmark: - 25% loads - 10% stores - 11% branches - 2% jumps - 52% R-type
• Suppose: - 40% of loads used by next instruction - 25% of branches mispredicted - All jumps flush next instruction
• What is the average CPI?
PipelinedPerformanceExample
Chapter5<136>
DIGITAL
BUILDINGBLO
CKS
• SPECINT2000 benchmark: - 25% loads - 10% stores - 11% branches - 2% jumps - 52% R-type
• Suppose: - 40% of loads used by next instruction - 25% of branches mispredicted - All jumps flush next instruction
• What is the average CPI? - Load/Branch CPI = 1 when no stalling, 2 when stalling - CPIlw = 1(0.6) + 2(0.4) = 1.4 - CPIbeq = 1(0.75) + 2(0.25) = 1.25 Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15
PipelinedPerformanceExample
Chapter5<137>
DIGITAL
BUILDINGBLO
CKS
• Pipelined processor critical path:
Tc = max {
tpcq + tmem + tsetup
2(tRFread + tmux + teq + tAND + tmux + tsetup )
tpcq + tmux + tmux + tALU + tsetup
tpcq + tmemwrite + tsetup
2(tpcq + tmux + tRFwrite) }
PipelinedPerformance
Chapter5<138>
DIGITAL
BUILDINGBLO
CKS
Element Parameter Delay (ps) Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup 20 Equality comparator teq 40 AND gate tAND 15 Memory write Tmemwrite 220 Register file write tRFwrite 100 ps
Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup )
= 2[150 + 25 + 40 + 15 + 25 + 20] ps = 550 ps
PipelinedPerformanceExample
Chapter5<139>
DIGITAL
BUILDINGBLO
CKS
Program with 100 billion instructions
Execution Time = (# instructions) × CPI × Tc
= (100 × 109)(1.15)(550 × 10-12)
= 63 seconds
PipelinedPerformanceExample
Chapter5<140>
DIGITAL
BUILDINGBLO
CKS
Processor Execution Time
(seconds)
Speedup
(single-cycle as baseline)
Single-cycle 92.5 1
Multicycle 133 0.70
Pipelined 63 1.47
ProcessorPerformanceComparison