cs/coe0447 computer organization & assembly language
DESCRIPTION
CS/COE0447 Computer Organization & Assembly Language. Chapter 5 Part 3. Single-cycle Implementation of MIPS. Our first implementation of MIPS used a single long clock cycle for every instruction - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/1.jpg)
1
CS/COE0447
Computer Organization & Assembly Language
Chapter 5 Part 3
![Page 2: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/2.jpg)
2
Single-cycle Implementation of MIPS
• Our first implementation of MIPS used a single long clock cycle for every instruction
• Every instruction began on one up (or, down) clock edge and ended on the next up (or, down) clock edge
• This approach is not practical as it is much slower than a multicycle implementation where different instruction classes can take different numbers of cycles– in a single-cycle implementation every instruction must take
the same amount of time as the slowest instruction– in a multicycle implementation this problem is avoided by
allowing quicker instructions to use fewer cycles
• Even though the single-cycle approach is not practical it was simpler and useful to understand first
• Now we are covering a multicycle implementation of MIPS
![Page 3: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/3.jpg)
3
A Multi-cycle Datapath
• A single memory unit for both instructions and data• Single ALU rather than ALU & two adders• Registers added after every major functional unit to hold
the output until it is used in a subsequent clock cycle
![Page 4: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/4.jpg)
4
Multi-Cycle ControlWhat we need to cover
• Adding registers after every functional unit– Need to modify the “instruction execution” slides to reflect this
• Breaking instruction execution down into cycles– What can be done during the same cycle? What requires a
cycle? – Need to modify the “instruction execution” slides again– Timing
• Control signal values – What they are per cycle, per instruction– Finite state machine which determines signals based on
instruction type + which cycle it is
• Putting it all together
![Page 5: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/5.jpg)
5
Execution: single-cycle (reminder)
• add– Fetch instruction and add 4 to PC add $t2,$t1,$t0– Read two source registers $t1 and $t0– Add two values $t1 + $t0– Store result to the destination register $t1 + $t0 $t2
![Page 6: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/6.jpg)
6
A Multi-cycle Datapath
•For add:•Instruction is stored in the instruction register (IR)•Values read from rs and rt are stored in A and B •Result of ALU is stored in ALUOut
![Page 7: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/7.jpg)
7
Multi-Cycle Execution: R-type
• Instruction fetch– IR <= Memory[PC]; sub $t0,$t1,$t2
– PC <= PC + 4;
• Decode instruction/register read– A <= Reg[IR[25:21]]; rs
– B <= Reg[IR[20:16]]; rt
– ALUOut <= PC + (sign-extend(IR[15:0])<<2); later
• Execution– ALUOut <= A op B; op = add, sub, and, or,…
• Completion– Reg[IR[15:11]] <= ALUOut; $t0 <= ALU result
![Page 8: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/8.jpg)
8
Execution: single-cycle (reminder)
• lw (load word) – Fetch instruction and add 4 to PC lw $t0,-12($t1)– Read the base register $t1– Sign-extend the immediate offset fff4 fffffff4– Add two values to get address X = fffffff4 + $t1– Access data memory with the computed address M[X]– Store the memory data to the destination register $t0
![Page 9: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/9.jpg)
9
A Multi-cycle Datapath
•For lw: lw $t0, -12($t1)•Instruction is stored in the IR•Contents of rs stored in A $t1•Output of ALU (address of memory location to be read) stored in ALUOut•Value read from memory is stored in the memory data register (MDR)
![Page 10: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/10.jpg)
10
Multi-cycle Execution: lw• Instruction fetch
– IR <= Memory[PC]; lw $t0,-12($t1)– PC <= PC + 4;
• Instruction Decode/register read– A <= Reg[IR[25:21]]; rs– B <= Reg[IR[20:16]];– ALUOut <= PC + (sign-extend(IR[15:0])<<2);
• Execution– ALUOut <= A + sign-extend(IR[15:0]); $t1 + -12 (sign extended)
• Memory Access– MDR <= Memory[ALUOut]; M[$t1 + -12]
• Write-back– Load: Reg[IR[20:16]] <= MDR; $t0 <= M[$t1 + -12]
![Page 11: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/11.jpg)
11
Execution: single-cycle (reminder)
• sw – Fetch instruction and add 4 to PC sw $t0,-4($t1)– Read the base register $t1– Read the source register $t0– Sign-extend the immediate offset fffc fffffffc– Add two values to get address X = fffffffc + $t1– Store the contents of the source register to the compu
ted address $t0 Memory[X]
![Page 12: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/12.jpg)
12
A Multi-cycle Datapath
•For sw: sw $t0, -12($t1)•Instruction is stored in the IR•Contents of rs stored in A $t1•Output of ALU (address of memory location to be written) stored in ALUOut
![Page 13: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/13.jpg)
13
Multi-cycle Execution: sw• Instruction fetch
– IR <= Memory[PC]; sw $t0,-12($t1)
– PC <= PC + 4;
• Decode/register read– A <= Reg[IR[25:21]]; rs
– B <= Reg[IR[20:16]]; rt
– ALUOut <= PC + (sign-extend(IR[15:0])<<2);
• Execution– ALUOut <= A + sign-extend(IR[15:0]); $t1 + -12 (sign extended)
• Memory Access– Memory[ALUOut] <= B; M[$t1 + -12] <= $t0
![Page 14: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/14.jpg)
14
Execution: single-cycle (reminder)
• beq– Fetch instruction and add 4 to PC beq $t0,$t1,L
• Assume that L is +3 instructions away
– Read two source registers $t0,$t1– Sign Extend the immediate, and shift it left by 2
• 0x0003 0x0000000c
– Perform the test, and update the PC if it is true • If $t0 == $t1, the PC = PC + 0x0000000c• [we will follow what Mars does, so this is not
Immediate == 0x0002; PC = PC + 4 + 0x00000008]
![Page 15: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/15.jpg)
15
A Multi-cycle Datapath
•For beq beq $t0,$t1,label•Instruction stored in IR•Registers rs and rt are stored in A and B•Result of ALU (rs – rt) is stored in ALUOut
![Page 16: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/16.jpg)
16
Multi-cycle execution: beq
• Instruction fetch– IR <= Memory[PC]; beq $t0,$t1,label– PC <= PC + 4;
• Decode/register read– A <= Reg[IR[25:21]]; rs– B <= Reg[IR[20:16]]; rt – ALUOut <= PC + (sign-extend(IR[15:0])<<2);
• PC + #bytes away label is (negative for backward branches, positive for forward branches)
• Execution– if (A == B) then PC <= ALUOut;
• if $t0 == $t1 perform branch• Note: the ALU is used to evaluate A == B; we’ll see that this does n
ot clash with the use of the ALU above.
![Page 17: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/17.jpg)
17
Execution: single-cycle (reminder)
• j– Fetch instruction and add 4 to PC– Take the 26-bit immediate field– Shift left by 2 (to make 28-bit immediate)– Get 4 bits from the current PC and attach to
the left of the immediate– Assign the value to PC
![Page 18: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/18.jpg)
18
A Multi-cycle Datapath
•For j•No accesses to registers or memory; no need for ALU
![Page 19: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/19.jpg)
19
Multi-cycle execution: j
• Instruction fetch– IR <= Memory[PC]; j label– PC <= PC + 4;
• Decode/register read– A <= Reg[IR[25:21]];– B <= Reg[IR[20:16]];– ALUOut <= PC + (sign-extend(IR[15:0])<<2);
• Execution– PC <= {PC[31:28],IR[25:0],”00”};
![Page 20: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/20.jpg)
20
Multi-Cycle ControlWhat we need to cover
• Adding registers after every functional unit– Need to modify the “instruction execution” slides to reflect this
• Breaking instruction execution down into cycles – What can be done during the same cycle? What requires a
cycle? – Need to modify the “instruction execution” slides again– Timing
• Control signal values – What they are per cycle, per instruction– Finite state machine which determines signals based on
instruction type + which cycle it is
• Putting it all together
![Page 21: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/21.jpg)
21
• Break up the instructions into steps– each step takes one clock cycle– balance the amount of work to be done in each step/cycle so that they are
about equal– restrict each cycle to use at most once each major functional unit so that
such units do not have to be replicated– functional units can be shared between different cycles within one
instruction
• Between steps/cycles– At the end of one cycle store data to be used in later cycles of the same
instruction• need to introduce additional internal (programmer-invisible) registers for this
purpose
– Data to be used in later instructions are stored in programmer-visible state elements: the register file, PC, memory
Multicycle Approach
![Page 22: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/22.jpg)
Operations
•These take time:•Memory (read/write); register file (read/write); ALU operations
•The other connections and logical elements have no latency (for our purposes)
![Page 23: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/23.jpg)
Operations
•Before: we had separate memories for instructions and data, and we hadextra adders for incrementing the PC and calculating the branch address. Nowwe have just one memory and just one ALU.
![Page 24: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/24.jpg)
24
Five Execution Steps
• Each takes one cycle• In one cycle, there can be at most one
memory access, at most one register access, and at most one ALU operation
• But, you can have a memory access, an ALU op, and/or a register access, as long as there is no contention for resources
• Changes to registers are made at the end of the clock cycle
![Page 25: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/25.jpg)
25
Step 1: Instruction Fetch
• Access memory w/ PC to fetch instruction and store it in Instruction Register (IR)
• Increment PC by 4 – We can do this because the ALU is not being
used for something else this cycle
![Page 26: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/26.jpg)
26
Step 2: Decode and Reg. Read
• Read registers rs and rt– We read both of them regardless of necessity
• Compute the branch address in case the instruction is a branch– We can do this because the ALU is not busy– ALUOut will keep the target address
![Page 27: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/27.jpg)
27
Step 3: Various Actions
• ALU performs one of three functions based on instruction type
• Memory reference– ALUOut <= A + sign-extend(IR[15:0]);
• R-type– ALUOut <= A op B;
• Branch:– if (A==B) PC <= ALUOut;
• Jump:– PC <= {PC[31:28],IR[25:0],2’b00};
![Page 28: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/28.jpg)
28
Step 4: Memory Access…
• If the instruction is memory reference– MDR <= Memory[ALUOut]; // if it is a load– Memory[ALUOut] <= B; // if it is a st
ore• Store is complete!
• If the instruction is R-type– Reg[IR[15:11]] <= ALUOut;
• Now the instruction is complete!
![Page 29: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/29.jpg)
29
Step 5: Register Write Back
• Only the lw instruction reaches this step– Reg[IR[20:16]] <= MDR;
![Page 30: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/30.jpg)
30
Summary of Instruction Execution
Step nameAction for R-type
instructionsAction for memory-reference
instructionsAction for branches
Action for jumps
Instruction fetch IR = Memory[PC]PC = PC + 4
Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
1: IF
2: ID
3: EX
4: MEM
5: WB
Step
![Page 31: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/31.jpg)
31
Multicycle Execution Step (1):Instruction Fetch
IR = Memory[PC];PC = PC + 4;
4PC + 4
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
![Page 32: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/32.jpg)
32
Multicycle Execution Step (2):Instruction Decode & Register
FetchA = Reg[IR[25-21]]; (A = Reg[rs])B = Reg[IR[20-15]]; (B = Reg[rt])ALUOut = (PC + sign-extend(IR[15-0]) << 2)
BranchTarget
Address
Reg[rs]
Reg[rt]
PC + 4
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
![Page 33: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/33.jpg)
33
Multicycle Execution Step (3):Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);
Mem.Address
Reg[rs]
Reg[rt]
PC + 4
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
![Page 34: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/34.jpg)
34
Multicycle Execution Step (3):ALU Instruction (R-Type)
ALUOut = A op B
R-TypeResult
Reg[rs]
Reg[rt]
PC + 4
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
![Page 35: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/35.jpg)
35
Multicycle Execution Step (3):Branch Instructions
if (A == B) PC = ALUOut;
BranchTarget
Address
Reg[rs]
Reg[rt]
BranchTarget
Address
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
![Page 36: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/36.jpg)
36
Multicycle Execution Step (3):Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)
JumpAddress
Reg[rs]
Reg[rt]
BranchTarget
Address
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
![Page 37: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/37.jpg)
37
Multicycle Execution Step (4):Memory Access - Read (lw)
MDR = Memory[ALUOut];
Mem.Data
PC + 4
Reg[rs]
Reg[rt]
Mem.Address
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
![Page 38: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/38.jpg)
38
Multicycle Execution Step (4):Memory Access - Write (sw)
Memory[ALUOut] = B;
PC + 4
Reg[rs]
Reg[rt]
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
![Page 39: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/39.jpg)
39
Multicycle Execution Step (4):ALU Instruction (R-Type)
Reg[IR[15:11]] = ALUOUT
R-TypeResult
Reg[rs]
Reg[rt]
PC + 4
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
![Page 40: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/40.jpg)
40
Multicycle Execution Step (5):Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
PC + 4
Reg[rs]
Reg[rt]Mem.Data
Mem.Address
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
![Page 41: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/41.jpg)
41
For Reference
• The next 5 slides give the steps, one slide per instruction
![Page 42: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/42.jpg)
42
Multi-Cycle Execution: R-type
• Instruction fetch– IR <= Memory[PC]; sub $t0,$t1,$t2
– PC <= PC + 4;
• Decode instruction/register read– A <= Reg[IR[25:21]]; rs
– B <= Reg[IR[20:16]]; rt
– ALUOut <= PC + (sign-extend(IR[15:0])<<2);
• Execution– ALUOut <= A op B; op = add, sub, and, or,…
• Completion– Reg[IR[15:11]] <= ALUOut; $t0 <= ALU result
![Page 43: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/43.jpg)
43
Multi-cycle Execution: lw• Instruction fetch
– IR <= Memory[PC]; lw $t0,-12($t1)– PC <= PC + 4;
• Instruction Decode/register read– A <= Reg[IR[25:21]]; rs– B <= Reg[IR[20:16]];– ALUOut <= PC + (sign-extend(IR[15:0])<<2);
• Execution– ALUOut <= A + sign-extend(IR[15:0]); $t1 + -12 (sign extended)
• Memory Access– MDR <= Memory[ALUOut]; M[$t1 + -12]
• Write-back– Load: Reg[IR[20:16]] <= MDR; $t0 <= M[$t1 + -12]
![Page 44: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/44.jpg)
44
Multi-cycle Execution: sw• Instruction fetch
– IR <= Memory[PC]; sw $t0,-12($t1)
– PC <= PC + 4;
• Decode/register read– A <= Reg[IR[25:21]]; rs
– B <= Reg[IR[20:16]]; rt
– ALUOut <= PC + (sign-extend(IR[15:0])<<2);
• Execution– ALUOut <= A + sign-extend(IR[15:0]); $t1 + -12 (sign extended)
• Memory Access– Memory[ALUOut] <= B; M[$t1 + -12] <= $t0
![Page 45: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/45.jpg)
45
Multi-cycle execution: beq
• Instruction fetch– IR <= Memory[PC]; beq $t0,$t1,label– PC <= PC + 4;
• Decode/register read– A <= Reg[IR[25:21]]; rs– B <= Reg[IR[20:16]]; rt – ALUOut <= PC + (sign-extend(IR[15:0])<<2);
• Execution– if (A == B) then PC <= ALUOut;
• if $t0 == $t1 perform branch
![Page 46: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/46.jpg)
46
Multi-cycle execution: j
• Instruction fetch– IR <= Memory[PC]; j label– PC <= PC + 4;
• Decode/register read– A <= Reg[IR[25:21]];– B <= Reg[IR[20:16]];– ALUOut <= PC + (sign-extend(IR[15:0])<<2);
• Execution– PC <= {PC[31:28],IR[25:0],”00”};
![Page 47: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/47.jpg)
47
Example: CPI in a multicycle CPU
• Assume– the control design of the previous slides– An instruction mix of 22% loads, 11% stores, 49% R-type operations, 16%
branches, and 2% jumps• What is the CPI assuming each step requires 1 clock cycle?
• Solution:– Number of clock cycles from previous slide for each instruction class:
• loads 5, stores 4, R-type instructions 4, branches 3, jumps 3
– CPI = CPU clock cycles / instruction count
= (instruction countclass i CPIclass i) / instruction count
= (instruction countclass I / instruction count) CPIclass I
= 0.22 5 + 0.11 4 + 0.49 4 + 0.16 3 + 0.02 3
= 4.04
![Page 48: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/48.jpg)
48
Multi-Cycle ControlWhat we need to cover
• Adding registers after every functional unit– Need to modify the “instruction execution” slides to reflect this
• Breaking instruction execution down into cycles– What can be done during the same cycle? What requires a
cycle? – Need to modify the “instruction execution” slides again– Timing
• Control signal values – What they are per cycle, per instruction– Finite state machine which determines signals based on
instruction type + which cycle it is
• Putting it all together
![Page 49: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/49.jpg)
49
A (Refined) Datapath fig 5.26
![Page 50: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/50.jpg)
50
Datapath w/ Control Signals Fig 5.27
![Page 51: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/51.jpg)
51
Final Version w/ Control Fig 5.28
![Page 52: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/52.jpg)
52
Multicycle Control Step (1):Fetch
IR = Memory[PC];PC = PC + 4;
1
0
1
0
1
0X
0X
0010
1
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
![Page 53: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/53.jpg)
53
Multicycle Control Step (2):Instruction Decode & Register
FetchA = Reg[IR[25-21]]; (A = Reg[rs])B = Reg[IR[20-15]]; (B = Reg[rt])ALUOut = (PC + sign-extend(IR[15-0]) << 2);
0
0X
0
0X
3
0X
X
010
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
![Page 54: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/54.jpg)
54
0X
Multicycle Control Step (3):Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);
X
2
0
0X
0 1
X
010
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
![Page 55: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/55.jpg)
55
Multicycle Control Step (3):ALU Instruction (R-Type)
ALUOut = A op B;
0X
X
0
0
0X
0 1
X
???
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
![Page 56: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/56.jpg)
56
1 if Zero=1
Multicycle Control Step (3):Branch Instructions
if (A == B) PC = ALUOut;
0X
X
0
0
X0 1
1
011
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
![Page 57: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/57.jpg)
57
Multicycle Execution Step (3):Jump Instruction
PC = PC[21-28] concat (IR[25-0] << 2);
0X
X
X
0
1X
0 X
2
XXX
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
![Page 58: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/58.jpg)
58
Multicycle Control Step (4):Memory Access - Read (lw)MDR = Memory[ALUOut];
0X
X
X
1
01
0 X
X
XXX
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
![Page 59: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/59.jpg)
59
Multicycle Execution Steps (4)Memory Access - Write (sw)Memory[ALUOut] = B;
0X
X
X
0
01
1 X
X
XXX
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
![Page 60: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/60.jpg)
60
10
0X
0
X
0
XXX
X
X
1
15 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
0
1
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
Multicycle Control Step (4):ALU Instruction (R-Type)
Reg[IR[15:11]] = ALUOut; (Reg[Rd] = ALUOut)
![Page 61: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/61.jpg)
61
Multicycle Execution Steps (5)Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
1
0
0
X
0
0X
0 X
X
XXX
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
0
1
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
![Page 62: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/62.jpg)
62
Multi-Cycle ControlWhat we need to cover
• Adding registers after every functional unit– Need to modify the “instruction execution” slides to reflect this
• Breaking instruction execution down into cycles– What can be done during the same cycle? What requires a
cycle? – Need to modify the “instruction execution” slides again– Timing: Registers/memory updated at the beginning of the next
clock cycle
• Control signal values – What they are per cycle, per instruction– Finite state machine which determines signals based on
instruction type + which cycle it is • Putting it all together
![Page 63: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/63.jpg)
63
Fig 5.28 For reference
![Page 64: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/64.jpg)
64
State Diagram, Big Picture
![Page 65: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/65.jpg)
65
Handling Memory Instructions
![Page 66: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/66.jpg)
66
R-type Instruction
![Page 67: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/67.jpg)
67
Branch and Jump
![Page 68: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/68.jpg)
68
A FSM State Diagram
![Page 69: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/69.jpg)
69
FSM Implementation
![Page 70: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/70.jpg)
70
Example: Load (1)
01 0 0
1
01
00
1
00
![Page 71: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/71.jpg)
71
Example: Load (2)
0
00
11
rs
rt
![Page 72: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/72.jpg)
72
Example: Load (3)
10
1
00
![Page 73: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/73.jpg)
73
Example: Load (4)
11 0
![Page 74: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/74.jpg)
74
Example: Load (5)
1
1
0
![Page 75: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/75.jpg)
75
Example: Jump (1)
01 0 0
1
01
00
1
00
![Page 76: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/76.jpg)
76
Example: Jump (2)
0
00
11
![Page 77: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/77.jpg)
77
Example: Jump (3)
1
10
![Page 78: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/78.jpg)
78
To Summarize…
• From several building blocks, we constructed a datapath for a subset of the MIPS instruction set
• First, we analyzed instructions for functional requirements
• Second, we connected buildings blocks in a way to accommodate instructions
• Third, we refined the datapath and added controls
![Page 79: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/79.jpg)
79
To Summarize…
• We looked at how an instruction is executed on the datapath in a pictorial way
• We looked at control signals connected to functional blocks in our datapath
• We analyzed how execution steps of an instruction change the control signals
![Page 80: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/80.jpg)
80
To Summarize…
• We compared a single-cycle implementation and a multi-cycle implementation of our datapath
• We analyzed multi-cycle execution of instructions
• We refined multi-cycle datapath
• We designed multi-cycle control
![Page 81: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/81.jpg)
81
To Summarize…
• We looked at the multi-cycle control scheme in detail
• Multi-cycle control can be implemented using FSM
• FSM is composed of some combinational logic and memory element
![Page 82: CS/COE0447 Computer Organization & Assembly Language](https://reader035.vdocuments.us/reader035/viewer/2022070410/568146a6550346895db3c2c5/html5/thumbnails/82.jpg)
82
Summary
• Techniques described in this chapter to design datapaths and control are at the core of all modern computer architecture
• Multicycle datapaths offer two great advantages over single-cycle– functional units can be reused within a single instruction if they are
accessed in different cycles – reducing the need to replicate expensive logic
– instructions with shorter execution paths can complete quicker by consuming fewer cycles
• Modern computers, in fact, take the multicycle paradigm to a higher level to achieve greater instruction throughput: – pipelining (later class) where multiple instructions execute
simultaneously by having cycles of different instructions overlap in the datapath
– the MIPS architecture was designed to be pipelined