ee457 midterm (~20-25%)

14
March 23, 2016 8:26 pm EE457 MT - Spring 2016 1 / 13 C Copyright 2016 Gandhi Puvvada EE457 Midterm (~20-25%) Closed-book Closed-notes Exam; No cheat sheets; No cell phones or computers Calculators and Verilog Guides are not allowed. Spring 2016 Instructor: Gandhi Puvvada Thursday, 3/24/2016 (A 2H 50M exam) 05:00 PM - 07:50 PM (170 min) in HAR101 Ques# Topic Page# Time Points Score 1 Lab 7 Part 3 Subpart 2 2 2 Branch Delay Slot and Lab 6 P4 3 3 Lab 7 Part 1 3-element adder 4-7 4 Virtual Memory 7 5 LW delay slot in Early Branch 8-9 6 Cache 10-11 7 Multi-cycle CPU 11-12 Total Cover+11 + Blank = 13 min. Perfect Score Student’s Last Name: _______________________________________ Student’s First Name: _______________________________________ Student’s DEN D2L username: ______________________________ @usc.edu

Upload: others

Post on 24-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm EE457 MT - Spring 2016 1 / 13 C Copyright 2016 Gandhi Puvvada

EE457 Midterm (~20-25%)Closed-book Closed-notes Exam; No cheat sheets; No cell phones or computers

Calculators and Verilog Guides are not allowed.

Spring 2016Instructor: Gandhi Puvvada

Thursday, 3/24/2016 (A 2H 50M exam)05:00 PM - 07:50 PM (170 min) in HAR101

Ques# Topic Page# Time Points Score

1 Lab 7 Part 3 Subpart 2 2

2 Branch Delay Slot and Lab 6 P4 3

3 Lab 7 Part 1 3-element adder 4-7

4 Virtual Memory 7

5 LW delay slot in Early Branch 8-9

6 Cache 10-11

7 Multi-cycle CPU 11-12

Total Cover+11+ Blank = 13

min.

Perfect Score

Student’s Last Name: _______________________________________

Student’s First Name: _______________________________________

Student’s DEN D2L username: [email protected]

Page 2: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm EE457 MT - Spring 2016 2 / 13 C Copyright 2016 Gandhi Puvvada

1 ( points) min. Lab 7 Part 3 Subpart 2:

1.1 Complete the STALL_12 generation logic and connections to the four EN (enables).

1.1.1 Redesign the STALL_12 generation logic on the side using the D-FF which is preset (rather than cleared) using the low-active RESET_B. You may use either zero or one inverter at most. You are not allowed to use two inverters or more. Would you recommend producing STALL_12 in Verilog in a separate combinational OFL or in the same clocked always block? ____________________________

1.2 Don’t we need a WBFF (Wrist-Band Flip-Flop) here? Did we forget to add a WBFF? Y / NExplain _____________________________________________________________________________________________________________________________________________________________________________________________________________________________

1.3 After you finished the architectural design, the VLSI engineer, for her layout convenience, has swapped the order of the [SUB3+R1 mux] with [ADD4+R2 mux] as shown on the side and did not do any other changes. Is she a Trojan or a Bruin? Tr/Br Will there be any change in the unsigned overflow behavior of the overall result? Yes / No

PC

XA

Reg. File

XA

RA

RDR-Write

0

1

0

10

1

A

Cout

A

Cout

Comp Station in ID Stage

ID_XMEX12

P

IF ID EX12 WBComp Station in ID Stage

Q

ID_XA EX12_RA

P=Q

ID_XMEX12 = ID_XA Matched with EX12_RA

XD

EN

XM

EX12

A-3 A+4

FU

EN

RD

Writ

e

RA

XD

EX12_RA

EX12_ADD4

EX12_SUB3

EX12_ADD1 WB_RA

WB_Write

WB_RDX_Mux

R1_MuxR2_Mux

SKIP

1

SKIP

2

Qualifying signals

LAB 7 Part 3 with EX1 and EX2 merged Block Diagram

I-MEMEN

ADD4SUB3

EN

FORW

D QCLKCLRCLK

ADD

4SU

B3A

DD

1

RA

MO

V

ADD

4SU

B3A

DD

1

RA

MO

V

EX12_MOV

RESET_B

RESET_BRESET_B

RESET_B

RESET_B

STALL_12STALL_12Q

D QCLK

PRE

CLK

RESET_B

0

1

0

1 A

Cout

A

Cout

A-3A+4R1_Mux

R2_Mux

SKIP

1

SKIP

2

ADD4SUB3

Page 3: EE457 Midterm (~20-25%)

March 24, 2016 6:50 am EE457 MT - Spring 2016 3 / 13 C Copyright 2016 Gandhi Puvvada

2 ( points) min. Branch Delay Slot

2.1 Filling the delay slot:

Can a delay slot be declared for(i) a conditional branch (beq/bne) (ii) an unconditional jump (j) (iii) a jal (iv) a jr $31Please circle all applicable.

Who fills the delay slot?Hardware / Compiler

For a conditional branch at the end of a loop with 1000 iterations, state your order of preference to fill the delay slot among the four choices : "a","b","c" as shown on the side and "d" is to just place a NOP. _______________________ Note: if a subset of choices are of equal priority, put them in parentheses Example ("a","c")

Repeat for a jump (j) instruction: _____________________________________________("a","b","c","d")Repeat for a jump and link (jal) instruction: ______________________________________("a","b","c","d")Repeat for a MIPS return from a subroutine (jr$31) instruction: ________________________("a","b","c","d")

State if the option "b" in the textbook figure above is easy, or moderately difficult or impossible for each of the four below:(i) conditional branches (beq/bne) Easy / Moderately difficult / Impossible

(ii) unconditional jumps (j) Easy / Moderately difficult / Impossible

(iii) subroutine calls (jal) Easy / Moderately difficult / Impossible

(iv) returns from subroutines (jr $31) Easy / Moderately difficult / Impossible

2.2 Reproduced on the side are two copies of a figure from Q3.3 of Lab 6 Part 4. In that assignment, we corrected it assuming ________ (zero/one/two) delay slots. Revise the two figures on the side for the remaining two assumptions.

Extract from your textbookpts

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

7-stage pipeline

PC

cont

rol

RESET RESET

Assistant #2’sdesign of flush

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

7-stage pipeline

PC

cont

rol

RESET RESET

Assistant #2’sdesign of flush

You are revising this based on the new assumption of ____ (0/1/2) delay slots.

You are revising this based on the new assumption of ____ (0/1/2) delay slots.

pts

it _______________________(as bad as / worse than) the hardware flush solution. For a beq, bne instructions declaring a delay slot and filling it with NOPs makes

it _______________________(as bad as / worse than) the hardware flush solution. For a j, jal, jr$31 instructions declaring a delay slot and filling it with NOPs makes

Page 4: EE457 Midterm (~20-25%)
Page 5: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm EE457 MT - Spring 2016 4 / 13 C Copyright 2016 Gandhi Puvvada

3 ( points) min. Lab 7 Part 1 3-element adder pipeline

The Lab 7 Part 1 performs ADD $R, $Z, $Y, $X or a NOP. Our VLSI engineer, Mr Bruin, was adding our 5-stage pipelined 3-element adder (Lab 7 Part 1) as an associate processor to a bigger processor and wanted to fit it in the leftover silicon in the big chip. However, he needs to add a dummy state to cover the wire delay between two parts of the silicon which are far apart. He has a choice of adding the dummy state either (A) between the original EX1 and EX2 or (B) between the original EX2 and WB. Your choice is ____ (A / B). Explain why? ________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________

Do you need a Z3_mux in the EX3 (DUMMY) stage of the (B) design above? ______ Y / N____________________________________________________________________________ ___________________________________________________________________________________

Can you stall the dependent instruction in EX1 stage instead of in the ID stage either in the original 5-stage pipeline or in this 6-stage pipeline? __________________________________________________________________________________________________________________________________________Dependency for the Z register on a senior _________ (did / didn’t) lead to a stall in the original 5-stage pipeline. Dependency for the Z register on a senior _________ (does/ doesn’t) lead to a stall in this/these 6-stage pipeline(s) __ _____ __ ______ ______________ (say #A and #B or something as appropriate).

Design A is drawn in two ways on the next two pages with Z2 and Z3 muxes gathered in the dummy stage.We can remove 3 of the 9 comparators in the design with Z2_mux and Z3_mux in natural order . T / FWe can remove 3 of the 9 comparators in the design with Z2_mux and Z3_mux in unnatural order . T / F

A senior who became a NOP or in the processes of becoming a NOP due to overflow _______________ (may be / should not be) allowed to provide forwarding help to a junior.

pts

Z1_mux Z2_muxZ3_mux

IF ID EX1 EX2(DUMMY) WBEX3A

Z1_mux Z2_muxIF ID EX1 EX2 WBEX3(DUMMY)B

pts

3pts

6pts

6pts

3pts

Page 6: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm

EE457 MT - Spring 2016 5 / 13

CC

opyright 2016 Gandhi Puvvada

30pts

PC

EN

RUN

ZA

YA

XA

RA

ZD

YD

XD

RA

RUN

RA

Reg. File

ZA

YA

XA

RA

RD

R-Write

EN

0

1

0

1

0

1

0

1

A

B

Add

er

Cout

S

A

B

S

Cout

Add

er

EN

Comp Station in ID Stage

ID_ZMEX1

ID_YMEX1

P=Q

P Q

P=Q

P Q

ID_ZA

ID_YA

ID_XMEX1

ID_XAEX1_RA

P=Q

P Q

P=Q

P Q

ID_ZA

ID_YA

ID_XA

ID_X

MEX

1=ID

_XA

Mat

ched

with

EX

1_R

A

Pipelined 3-element Adder P PQ Q

Block Diagram

IF ID EX1

Comp Station in ID Stage

EX3

EX2_RA

ID_XMEX2

ID_YMEX2

ID_ZMEX2

Y_Mux

X_Mux

Z1_mux Z3_mux

ZD

RA

EN

INS-

ME

M

EX1_

CO

UT

EX1_RA EX3_RA WB_RA

LAB 7 Part1 with a dummy stage

WB

ID_XA

ID_YA

ID_ZA

EN

RU

N

RU

N

XD

+YD

XD

+YD

+ZD

XD

YD

ZD

STALL

STA

LL_

B

X_FORW1

Y_FORW1

Z_FORW1 Z_FORW3WB_RD

WB_WRITE

EX3_

COU

T

ID_R

A

P=Q P=Q

Complete the design (6 EN controls, forward-ing paths, bubble-injection, etc.)

Generate on the next to next page: Z_FORW2, Z_FORW3

P=Q

P Q

P=Q

P Q

ID_ZA

ID_YA

ID_XA

P Q

EX3_RA

ID_XMEX3

ID_YMEX3

ID_ZMEX3

P=Q

0

1

Z2_mux

ZD

RA

EN

EX2_RA

RU

NX

D+Y

D

Z_FORW2

EX2

I.F.

R.F

EX1_RUN_IN EX3_RUN_INEX2_RUN_IN

WB_RA

WB_WRITE

WB_RD

Z2_mux and Z3_mux in natural order

Page 7: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm

EE457 MT - Spring 2016 6 / 13

CC

opyright 2016 Gandhi Puvvada

PC

EN

RUN

ZA

YA

XA

RA

ZD

YD

XD

RA

RUN

RA

Reg. File

ZA

YA

XA

RA

RD

R-Write

EN

0

1

0

1

0

1

0

1

A

B

Add

er

Cout

S

A

B

S

Cout

Add

er

EN

Comp Station in ID Stage

ID_ZMEX1

ID_YMEX1

P=Q

P Q

P=Q

P Q

ID_ZA

ID_YA

ID_XMEX1

ID_XAEX1_RA

P=Q

P Q

P=Q

P Q

ID_ZA

ID_YA

ID_XA

ID_X

MEX

1=ID

_XA

Mat

ched

with

EX

1_R

A

Pipelined 3-element Adder P PQ Q

Block Diagram

IF ID EX1

Comp Station in ID Stage

EX3

EX2_RA

ID_XMEX2

ID_YMEX2

ID_ZMEX2

Y_Mux

X_Mux

Z1_mux Z2_mux

ZD

RA

EN

INS-

ME

M

EX1_

COU

T

EX1_RA EX3_RA WB_RA

LAB 7 Part1 with a dummy stage

WB

ID_XA

ID_YA

ID_ZA

EN

RU

N

RU

N

XD

+YD

XD

+YD

+ZD

XD

YD

ZD

STALL

STA

LL

_B

X_FORW1

Y_FORW1

Z_FORW1 Z_FORW2WB_RD

WB_WRITE

EX3_

COU

T

ID_R

A

P=Q P=Q

Complete the forwarding paths toZ_FORW2, Z_FORW3

Generate on the next page Z_FORW2, Z_FORW3

P=Q

P Q

P=Q

P Q

ID_ZA

ID_YA

ID_XA

P Q

EX3_RA

ID_XMEX3

ID_YMEX3

ID_ZMEX3

P=Q

0

1

Z3_mux

ZD

RA

EN

EX2_RA

RU

NX

D+Y

D

Z_FORW3

EX2

I.F

.R.F

EX1_RUN_IN EX3_RUN_INEX2_RUN_IN

WB_RA

WB_WRITE

WB_RD

Z2_mux and Z3_mux in unnatural order

20pts

Page 8: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm EE457 MT - Spring 2016 7 / 13 C Copyright 2016 Gandhi Puvvada

Produce the Z_FORW2 and Z_FORW3 for both the designs below.

4 ( points) min. Virtual MemoryMMU stands for __________________________________.VPN stands for _____________________. PPFN stands for ____________________________TLB stands for ___________________________ Buffer. Given _______ (VPN/PPFN) the TLB provides __________ (VPN/PPFN).Given _______ (VPN/PPFN) the page table provides __________ (VPN/PPFN).We ___________ (use / don’t use) parallel search to search the page table. We ___________ (use / don’t use) binary search (also called dictionary search) to search the page table.

20pts

Z_FORW2

Z_FORW3

For the design withZ2_mux and Z3_mux in unnatural order

Z_FORW2

Z_FORW3

For the design withZ2_mux and Z3_mux in natural order

pts

Page 9: EE457 Midterm (~20-25%)

March 24, 2016 7:41 am EE457 MT - Spring 2016 8 / 13 C Copyright 2016 Gandhi Puvvada

5 ( points) min. Early Branch (Lab6 Part 4 rev 3 design)

Given on the next page is the block diagram of the Lab 6 Early Branch design. Given below is the pseudo code for the HDU and HDU_Br of the Early Branch design.

In this question, we are dealing with the Load Word Delay slot and not the Branch delay slot. So please do not get confused. They are totally different.

If an ISA has declared one LW (Load Word) delay slot, it means that the compiler ____________(should / shouldn’t) place an instr. dependent on the LW, right _________ (before / after) the LW .

Our current Lab 6 Early Branch design assumes ________ (0/1) LW delay slots.Modify the design on the next page and the pseudo code for HDU and HDU_Br below to suit change the "LW delay slot" aspect of our design. So you are going to _________ (add / remove) a LW delay slot ____ (to / from) our Lab 6 design. This calls for FU_Br to change. T / F . This calls for FU to change. T / F

pts

pts

pts

========================================================================================HDU (Original Hazard Detection Unit in ID stage):Note: Here ID/EX.WriteRegister refers to the WriteRegister after the mux governed by RegDst. We could replace it with ID/EX.WriteRegisterRt . If [ ID/EX.MemRead and (ID/EX.WriteRegister =/= 0) and {(ID/EX.WriteRegister == IF/ID.ReadRegister_RS) or (ID/EX.WriteRegister == IF/ID.ReadRegister_RT)} ]then make STALL_LW = 1========================================================================================HDU_Br (New Hazard Detection Unit in ID stage to serve the early branch):Note: Here ID/EX.WriteRegister refers to the WriteRegister after the mux governed by RegDst.If [ Branch and [ [ ID/EX.RegWrite and (ID/EX.WriteRegister =/= 0) and { (ID/EX.WriteRegister == IF/ID.ReadRegister_RS) or (ID/EX.WriteRegister == IF/ID.ReadRegister_RT)} ] or [ EX/MEM.MemRead and (EX/MEM.WriteRegister =/= 0) and { (EX/MEM.WriteRegister == IF/ID.ReadRegister_RS) or (EX/MEM.WriteRegister == IF/ID.ReadRegister_RT)} ] ] ]then make STALL_BEQ = 1========================================================================================

Page 10: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm

EE457 MT - Spring 2016 9 / 13

CC

opyright 2016 Gandhi Puvvada

Hazarddetection

unit

04

Inst

ruct

ion

mem

ory

PC

+

r1

r2

R1

R2w

W

opco

ders

rtrd

shift

func

t

Reg

iste

rs

Co

ntr

ol(PC

)

(rs)

(rt)

ALU

rtrd

ALUctrlSign

ext.

EX

MEWB

ALUSrcALUOpRegDst

ALUSrc

Reg

Dst

ALUOp

RegWrite_EX

Dat

am

emor

y

WR

ME

WB

ALU

_res

ult

@

W

R

Mem

Rea

d

Mem

Writ

e

Stor

e_da

ta

Reg

Writ

e

IF.Flush

WR

WB

MEM

_dat

aR

EG_d

ata

Reg

Writ

e

MemtoReg

+

=

func

ts_

ext

ShiftLeft 2

Zero

Forwarding Unit

Designed by: Gandhi PuvvadaDetailed implementation of Early Branch suggested in 3rd Ed.10/18/06

IF/IDIF-Stage

ID/EXID-Stage

EX/MEMEX-Stage MEM-Stage

MEM/WBWB-Stage

rs

MemRead_EXMemRead_MEM

Writ

eReg

iste

r_EX

FU_BrFW

_RS_

WB

FW_R

S_M

EM

FW_R

T_W

B

FW_R

T_M

EM

FW_R

T

FW_R

S

Writ

eReg

iste

r_M

EM

WriteRegister_MEMHDU_Br

STALL_BEQSTALL_LW

STALL

Bra

nch

01

0

1

1

0

0

1

11

11

1

0

0

00

0

0

0

1

Bra

nch

1

fowarding_mux_control

Drawn by: Wei-jen Hsu

Early Branch(Current Lab6)

pts

Page 11: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm EE457 MT - Spring 2016 10 / 13 C Copyright 2016 Gandhi Puvvada

6 ( points) min. Cache

6.1 Shown below is a typical CAM (Content Addressable Memory) and a Data RAM to go with it.

The CAM size is specified as 32 x 16. "32" means there are 32 TAGs stored in. Then what is 16?____________________________________________________________________________When you switch on Power, do you clear all 32 tags to zeros or invalidate the 32 valid bits?____________________________________________________________________________How wide is the TAG in the above diagram and how wide is the comparison unit? How can you tell?________________________________________________________________________________________________________________________________________________________

25 = 32. So is there any 5-bit address? _________________________________________________________________________________________________________________________

Please write sizes of all 4 buses marked as .

How big is the Data RAM? _____X______

Is there any relation between the width of the CAM and the 32-bit width of the Data RAM?Yes / No. ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Is there any relation between the depth of the CAM and the depth of the Data RAM? Yes / No. ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________

I read somewhere that unlike RAM (which receives address and gives out Data) a CAM would receive Data and provide Address. But I see Write Addr as input at the top of the CAM. Explain.____________________________________________________________________________________________________________________________________________________________________________________________________________________________________

The "Data in" fed to the Data RAM is used during a ______________ (Cache Miss / Cache Hit).

TAG comparison inside a CAM:Several such comparators comparethe incoming Tag with all the stored Tags simultaneously.

Page 12: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm EE457 MT - Spring 2016 11 / 13 C Copyright 2016 Gandhi Puvvada

6.2 Cache and MM organization for direct/set-associative caches

If the TAG size is 17 bits in a 32-bit address byte-addressable processor, we can tell the size of the cache if we know further the following (state needed or not needed for each (or true or false)):

Block size in words: needed/not needed . Word size in bytes: needed/not needed .Main Memory degree of lower-order interleaving: needed/not needed .

If it is set-associative, the degree of set-associativity (i.e. # of blocks per set): needed/not needed . If it is direct mapping, nothing more needed. True / False

At a late point of the CPU chip design (which includes the CPU and the L1-Cache), if you change the DSA (degree of set-associativity) from 2 to 4, and also double the size of the cache, the set-field will _________ (A/B/C/D/E) and the TAG field will _________ (A/B/C/D/E) where A = increase by 1 bit; B = increase by 2 bits; C = decrease by 1 bit; D = decrease by 2 bits; E = no change

The degree of Lower-Order Interleaving for the MM is decided by ___________________________________________________________________________________________________

7 ( points) min. Multi-cycle CPU

Reproduced on the next page is the modified CU state diagram for the 2nd edition multi-cycle CPU design from a previous exam where we fetch the next instruction in the last clock of the current instruction.

This kind of improvement is equally suitable for the 1st edition design. True / FalseExplain: _____________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________

Why didn’t we try to apply this improvement to the last clock (State 5) of the "SW" instruction?____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________

Why didn’t we try to apply this improvement to the last clock (State 8) of the "BEQ" instruction?____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________

Page 13: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm EE457 MT - Spring 2016 12 / 13 C Copyright 2016 Gandhi Puvvada

Just FYINothing needsto be done here.

It is not difficult to get an A in EE457. You need to work for it and seek help from the 457 teaching team on whatever you do not understand. We are eager to help you. The advanced topics in the last 5 weeks are interesting and challenging too. About 40% of the final exam focuses on these topics. They are important for your interviews also. Best! Gandhi, TAs: Jizhe, Pezhman, Mentors: Kalpana, Bo, HW Graders: Monisha, Zihao Lab graders: Maanasa, Nita

Page 14: EE457 Midterm (~20-25%)

March 23, 2016 8:26 pm EE457 MT - Spring 2016 13 / 13 C Copyright 2016 Gandhi Puvvada

Blank page: Please write your name and email. Tear it off and use for rough work. Do not submit.Student’s Last Name:____________________ email: __________________