cmu 18-447 introduction to computer architecture, spring...

CMU 18-447 Introduction to Computer Architecture, Spring 2015

Final ExamDate: Tue., 5/5

Instructor: Onur MutluTAs: Rachata Ausavarungnirun, Kevin Chang, Albert Cho, Jeremie Kim, Clement Loh

Name:

Problem 1 (50 Points):







Bonus (50 Points):

Total (400 Points):

Instructions:

1. This is a closed book exam. You are allowed to have three letter-sized cheat sheets.

2. No electronic devices may be used.

3. This exam lasts 3 hours.

4. Clearly indicate your final answer for each problem.

5. Please show your work when needed.

6. Please write your initials at the top of every page.

7. Please make sure that your answers to all questions (and all supporting work that is required)are contained in the space required.

Tips:

• Be cognizant of time. Do not spend too much time on one question.

• Be concise. You will be penalized for verbosity and unnecessarily long answers.

• Show work when needed. You will receive partial credit at the instructors’ discretion.

• Write legibly. Show your final answer.

Initials:

1. Virtual Memory and Caches [50 points]

Assume that we have a byte-addressable processor that implements paging-based virtual memoryusing a three-level hierarchical page table organization. The virtual address is 46 bits, and thephysical memory is 4GB. The page table base register (PTBR) stores the base address of the first-level table (PT1 ). All the page tables have the same size of one physical page for the first-level(PT1 ), second-level (PT2 ), and third level (PT3 ). Each PT1 /PT2 entry stores the base address ofa second/third-level table (PT2 /PT3 ). In contrast to PT1 and PT2 , each PT3 entry stores a pagetable entry (PTE). The PTE is 4-bytes in size.

The processor has a 64KB virtually-indexed physically-tagged (VIPT) L1 cache that is directmapped with a cache line size of 128 bytes, and a 64-entry TLB.

(a) What is the physical page size? Show all your work.

1. First, the physical address space is log2 4GB = 32bits. So each PT1 and PT2 entrystores 32bits (4bytes).2. Since the virtual address space is 46 bits and assume the page size is P , then there

are 246

P possible mappings that the page tables can store.

3. Each page table can store P4 mappings because each entry is 4 bytes and the table

is P bytes. The three-level hierarchical page table can in total store (P4 )(P

4 )(P4 ) = 246

Pmappings.4. P = 213.

(b) How many bits of the virtual page number are used to index the L1 cache?

The cache requires log2 64KB = 16 bits from the address for indexing. So 3 bits areoverlapped with the VPN.

(c) What kind of aliasing problem does this processor’s L1 cache have?

Synonym – multiple different virtual addresses map to the same physical address, causingmultiple copies of the data belonging to the same physicall address residing in the cache.

2/26

Initials:

(d) In lecture, we learned multiple techniques to resolve this particular aliasing problem (in part (c)above) in a VIPT cache. One of them is to increase the associativity of the cache. To addressthis aliasing problem for this processor’s VIPT cache, what is the minimum associativity that isrequired for the cache? Show your work.

23 = 8

(e) We also learned another technique that searches all possible sets that can contain the same physicalblock. For this VIPT cache, how many sets need to be searched to fix the aliasing problem? Showyour work.

23 = 8

3/26

Initials:

2. Register Renaming [50 points]

In this problem, we will give you the state of the Register Alias Table (RAT), Reservation Stations(RS), and Physical Register File (PRF) for a Tomasulo-like out-of-order execution engine.

The out-of-order machine in this problem has the following characteristics:

• The processor is fully pipelined with four stages: Fetch, decode, execute, and writeback.

• For all instructions, fetch takes 1 cycle, decode takes 1 cycle, and writeback takes 1 cycle.

• The processor implements ADD and MUL instructions only. Both the adder and multiplier arefully pipelined. ADD instructions take 3 cycles and MUL instructions take 4 cycles in the executestage. Note that the adder and multiplier have separate common data buses (CDBs), whichallow both the adder and multiplier to broadcast results in the same cycle.

• An instruction always allocates the first reservation station that is available (in top-to-bottomorder) at the required functional unit.

Suppose the pipeline is initially empty and the machine fetches exactly 5 instructions. The diagrambelow shows the snapshot of the machine at a particular point in time.

Register Alias TableID V Tag

R0 1 P1R1 1 P8R2 1 P12R3 1 P4R4 0 P7R5 1 P5R6 0 P11R7 1 P14

Physical Register FileID V Data ID V Data

P0 0 2 P8 1 88P1 1 11 P9 0 90P2 0 2 P10 0 91P3 0 30 P11 1 110P4 1 3 P12 1 33P5 1 50 P13 1 130P6 0 5 P14 1 17P7 1 70 P15 1 159

ADD Reservation StationID V Tag V Tag Dest. Tag

A 1 P12 1 P7 P15B 1 P5 0 P13 P11

ADD CDB

MUL Reservation StationID V Tag V Tag Dest. Tag

X 0 – 0 – –Y 1 P12 1 P4 P7Z 1 P7 1 P7 P13

MUL CDB

4/26

Initials:

(a) Your first task is to use only the supplied information to draw the data flow graph for the fiveinstructions which have been fetched. Label nodes with the operation (+ or *) being performedand edges with the architectural register alias numbers (e.g., R0).

R0 R3

+

* R3

R4

R6

R6

R6R5

R2*

+

*

(b) Now, use the data flow graph to fill in the table below with the five instructions being executedon the processor in program order. The source registers can be specified in either order. Giveinstructions in the following format: “opcode, source1, source2, destination.”

OP Src 1 Src 2 Dest

MUL R0 R3 R2

MUL R2 R3 R4

ADD R2 R4 R6

MUL R4 R4 R6

ADD R5 R6 R6

(c) Now, show the full pipeline timing diagram below for the sequence of five instructions that youdetermined above, from the fetch of the first instruction to the writeback of the last instruction.Assume that the machine stops fetching instructions after the fifth instruction.

As we saw in class, use F for fetch, D for decode, En to signify the nth cycle of executionfor an instruction, and W to signify writeback. You may or may not need all columns shown.Finally, identify the cycle after which the snapshot of the microarchitecture was taken. Shade thecorresponding cycle in the last row of the table.

Cycle: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Instruction 1 F D E1 E2 E3 E4 WInstruction 2 F D E1 E2 E3 E4 WInstruction 3 F D E1 E2 E3 WInstruction 4 F D E1 E2 E3 E4 WInstruction 5 F D E1 E2 E3 W

Snapshot cycle X

5/26

Initials:

3. Branch Prediction [50 points]

Assume the following piece of code that iterates through two large arrays, j and k, each populatedwith completely (i.e., truly) random positive integers. The code has two branches (labeled B1 andB2). When we say that a branch is taken, we mean that the code inside the curly brackets is executed.Assume the code is run to completion without any errors (there are no exceptions). For the followingquestions, assume that this is the only block of code that will ever be run, and the loop-conditionbranch (B1) is resolved first in the iteration before the if-condition branch (B2).

for (int i = 0; i < 1000000; i++) { /* B1 *//* TAKEN PATH for B1 */

if (i % 3 == 0) { /* B2 */j[i] = k[i] - i; /* TAKEN PATH for B2 */

}}

You are running the above code on a machine with a two-bit global history register (GHR) sharedby all branches, which starts with Not Taken, Not Taken (2’b00). Each pattern history table entry(PHTE) contains a 2-bit saturating counter.The saturating counter values are as follows:2’b00 - Strongly Not Taken2’b01 - Weakly Not Taken2’b10 - Weakly Taken2’b11 - Strongly Taken

(a) You observe that the branch predictor mispredicts 100% of the time in the first 5 iterations of theloop. Is this possible? Fill in the table below with all possible initial values each entry can take.Leave the table blank if this is not possible.

PHTPHT Entry Value

TT 01

TN 00

NT 01

NN 00 or 01

Show your work here:

The pattern after 5 iterations: TTTNTNTTTN.• For GHR=NN, the only observed branch is T, which is the first taken branch.

Therefore, the PHTE for NN has to be either 00 or 01 so that the branch predictormispredicts the taken branch.

• For GHR=TT, the observed branches are T N T N. The PHTE for TT has to beinitialized to 01 in order to cause the predictor to always mispredict.

• For GHR=TN, the observed branches are T T. Thus, the initial PHTE value forTN has to be 00 to mispredict both taken branches.

• For GHR=NT, the observed branches are T (i.e., the second taken branch) N T.Similar to the TT entry, NT’s PHTE has to be initialized 01.

6/26

Initials:

(b) Please read the entire question first before answering any part.

Rachata believes that the misprediction rate can become 0% during the steady state.

Is this possible?

Circle one: YES NO

If it is possible (YES), fill in one possible set of initial PHTE values that can lead to a 0%misprediction rate.

PHTPHT Entry Value

TT

TN

NT

NN

If it is not possible (NO), what is the lowest misprediction rate that can be achieved during thesteady state?

The lowest misprediction rate is 33.33%.

Show all your work here below:

No, it’s not possible. The best correct prediction rate is 4/6, which is 1/3 mispredictionrate.At steady state, we will keep observing the following pattern which repeats over time:TTTNTN.With GHR=TN, this entry will saturate to 11, taken all the time. Therefore, 2 Ts willbe always predicted correctly out of the 6 branches in the pattern.With GHR=NT or TT, the predictor will observe either T and N. No matter whatthe initial values are for these two entries, only one of the branches can be predictedcorrectly. Therefore 2 out of of remaining 4 branches in the pattern will be predictedcorrectly.With GHR=NN, the predictor won’t observe it during the steady state.

7/26

Initials:

4. Interconnects [50 points]

The following diagrams show four different topologies. In this question, assume that a packet canmove from one node to the adjacent node in 1 cycle. Also, assume that the routing mechanism usesthe shortest path from the source to the destination.

a) Uni-Directional Ring b) Bi-Directional Ring

c) 2-D Torus d) 3-D Torus

8/26

Initials:

(a) What is the average latency of a uni-directional ring of size n, assuming a uniform traffic patternwhere every node has an equal probability of sending a packet to every other node without trafficcontention? No traffic contention means that a packet can always move toward its destinationevery cycle on its shortest path. For this and the following questions, assume that n is an oddnumber. Show your work.

Avg(1 + 2 + 3 + ...+ n− 1) (n)(n−1)2(n−1) = n

2

(b) What is the average latency of a bi-directional ring of size n, assuming a uniform traffic patternwithout traffic contention? Show your work.

For an odd number of n, the average latency is (n+1)4 .

9/26

Initials:

(c) What is the average latency of a n ∗ n torus, assuming a uniform traffic pattern without trafficcontention? Show your work. (Hint: each ring in a torus is a bi-directional ring.)

2 ∗ (n+1)4 = (n+1)

2

(d) What is the average latency of a n ∗ n ∗ n 3-D torus, assuming a uniform traffic pattern withouttraffic contention? Show your work.

3 ∗ (n+1)4

10/26

Initials:

5. Memory Consistency [40 points]

There are 2 threads with 4 instructions. The two threads are executed concurrently on a dual-coreprocessor. Assume that registers in both cores are initialized with the values shown in the table below.The instructions of each thread are also shown below.

R1 1

R2 2

R3 3

R4 4

Thread A Thread B

ST R1, 0x1000 ST R3, 0x1000

LD R5, 0x1000 LD R5, 0x1000

ADD R5, R5, R2 ADD R5, R5, R4

ST R5, 0x1000 ST R5, 0x1000

(a) Assume the dual-core processor implements sequential consistency. List all the possible valuesthat can be stored in address 0x1000, assuming both threads run to completion.

3, 5, 7, 9

(b) How many different memory instruction interleavings of the 2 threads will guarantee a value of0x9 in address 0x1000, assuming both threads run to completion? Show your work.

4

11/26

Initials:

(c) Assume now that the dual-core processor does not support sequential consistency. List all thepossible values that can be stored in address 0x1000, assuming both threads run to completion.Explain your answer briefly.

3, 5, 7, 9Thread B might never see the sequential updates of Thread A, but the result will beequivalent to that of a sequentially consistent model.

12/26

Initials:

6. Memory Interference [50 points]

During the lectures, we introduced a variety of ways to tackle memory interference. In this problem,we will look at the Blacklisting Memory Scheduler (BLISS) to reduce unfairness. There are two keyaspects of BLISS that you need to know.

• When the memory controller services η consecutive requests from a particular application, thisapplication is blacklisted. We name this non-negative integer η the Blacklisting Threshold.

• The blacklist is cleared periodically every 10000 cycles starting at t=0.

To reduce unfairness, memory requests in BLISS are prioritized in the following order:

• Non-blacklisted applications’ requests

• Row-buffer hit requests

• Older requests

The memory system for this problem consists of 2 channels with 2 banks each. Tables 1 and 2 showthe memory request stream in the same bank for both applications at varying times. The memoryrequests are labeled with numbers that represent the row position of the data within the accessedbank. Assume the following for all questions:

• A row buffer hit takes 50 cycles.

• A row buffer miss/conflict takes 200 cycles.

• All the row buffers are closed at time t=0.

Application A (Channel 0, Bank 0)Application B (Channel 0, Bank 0) Row 1 Row 1 Row 1 Row 1 Row 1 Row 1 Row 1

Table 1: Memory requests of the two applications at t=0

Application A (Channel 0, Bank 0) Row 3 Row 7 Row 2 Row 0 Row 5Application B (Channel 0, Bank 0) Row 1 Row 1 Row 1 Row 1 Row 1 Row 1 Row 1

Table 2: Memory requests of the two applications at t=10

13/26

Initials:

(a) Compute the slowdown of each application using the FR-FCFS scheduling policy after boththreads ran to completion. We define slowdown = mem latency with others

mem latency alone . Show your work.

Slowdown of A: (200+6∗50)+5∗2005∗200 = 1500

1000 = 1.50

Slowdown of B: 200+6∗50200+6∗50 = 500

500 = 1.00

(b) For what value(s) of η (the Blacklisting Threshold) will the slowdowns for both applications beequivalent to those obtained with FR-FCFS?

For η≥7 or η = 0.We want both A and B to complete without blacklisting or to complete both blacklisted,thus η≥7 and η = 0 respectively.

14/26

Initials:

(c) For what value(s) of η (the Blacklisting Threshold) will the slowdown for A be <1.4?

Impossible. Slowdown for A will always be ≥1.4.If you trace the schedule carefully, you will observe that A will be the fastest when η=5,where slowdown of A=1.4. η=5 is the smallest η where A does not get blacklisted.

(d) For what value(s) of η (the Blacklisting Threshold) will B experience maximum slowdown it canexperience with the Blacklisting Scheduler?

η=5 or η=6We observe that as long as B gets blacklisted at least once, B will incur an additionalmiss and hence an extra of 200 cycles. Thus, we want 2 conditions to be satisfied: B tomiss at least once AND an η such that B completes last. These conditions are satisfiedonly when η=5 or η=6. MaximumSlowdownofB = 1650

500 = 3.3

(e) What is a simple mechanism (that we discussed in lectures) that will make the slowdowns of bothA and B 1.00?

Memory Channel Partitioning (MCP)

15/26

Initials:

7. Memory Latency Tolerance [60 points]

Assume an in-order processor that employs runahead execution, with the following specifications:

• The processor enters Runahead mode when there is a cache miss.

• There is a 64KB cache. The cache block size is 64 Bytes.

• The cache is 2-way set associative and uses the LRU replacement policy.

• A cache hit is serviced instantaneously.

• A cache miss is serviced after X cycles.

• The cache replacement policy chooses to evict a cache block serviced by Runahead requests overnon-runahead requests. The processor does not evict the request that triggers Runahead modeuntil after Runahead mode is over.

• The victim for cache eviction is picked at the same time a cache miss occurs.

• Whenever there is a cache miss, the processor always generates a new cache request and entersRunahead mode.

• There is no penalty for entering and leaving Runahead mode.

• ALU instructions and Branch instructions take one cycle each and never stall the pipeline.

Consider the following program. Each element of array A is one byte.

for(int i=0;i<100;i++) \\ 2 ALU instructions and 1 branch instruction{

int m = A[i*32*1024]+1; \\ 1 memory instruction followed by 1 ALU instruction26 ALU instructions

}

(a) After running this program, you find that there are 50 cache misses. What are all the possiblevalues of X?

30 < X < 61.

(b) Is it possible that every cache access in the program misses in the cache? If so, what is the valueof X that will make all cache accesses in the program miss in the cache? If not, why? Show yourwork.

A cache latency of less than 31 cycles will not generate any cache requests during therunahead mode. A cache latency of at least 61 cycles will generate two fetches to thecache. The second fetch will evict the first runahead request in the cache, causing everyaccess into a cache miss.

16/26

Initials:

(c) What is the minimum number of cache misses that this program can achieve? Show your work.

50 misses.

(d) Assume that each ALU instruction consumes 1uJ, a cache hit consumes 10uJ, and a cache missconsumes Y uJ. Does there exist a combination of X and Y such that the dynamic energyconsumption of Runahead execution is better than a processor without Runahead execution?Show your work.

No. With Runahead execution, the processor will always have to fetch extra instructions.With runahead execution, the processor will cause at least 100 cache misses (some inRunahead mode) and 30*100 ALU instruction.

17/26

Initials:

(e) Assume the energy parameters in part d. What is the dynamic energy consumption of the pro-cessor with Runahead execution in terms of X and Y when X generates the minimum numberof cache misses? Show your work.

At most, this program will convert 50 cache misses to cache hits while leaving the other50 be cache misses. This case will happen if and only if the cache latency (X) is between31 < X < 60With runahead: misses ∗ Y + ALU ∗ 1 + hits ∗ 10+3 instructions at the end of theloop+100∗ALU in Runahead modeWith runahead: 3 + 30 ∗ 1 ∗ 100 + 50 ∗ 10 + 100 ∗Y + 100X = 3503 + 100Y + 100(X − 1)

(f) Assume the energy parameters in part d. What is the dynamic energy consumption of the pro-cessor with Runahead execution in terms of X and Y when X generates the maximum numberof cache misses? Show your work.

In the worst case, every cache accesses is a miss. This will happen when X <= 30and X > 60 However, in this case, we have to break the calculation for the energyconsumption into two cases, the first case is when X < 31 and the second case is whenRunahead execution generates extra cache requests.When X < 31, the energy consumption is 3003 + 100Y + 100X.When X > 60, there will be bX/30c additional memory access in Runahead mode. So,the energy consumption is 3003 + 100(X − bX/30c) + 100Y ∗ bX/30c.

18/26

Initials:

8. [Bonus] Mystery Instruction Strikes Back [50 points]

That pesky engineer implemented yet another mystery instruction on the LC-3b. It is your job todetermine what the instruction does. The mystery instruction is encoded as:

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 01010 DR SR1 0 0 0 0 0 0

The modifications we make to the LC-3b datapath and the microsequencer are highlighted in theattached figures (see the next three pages after the question). We also provide the original LC-3bstate diagram, in case you need it.

In this instruction, we specify SR2OUT to always output REG[SR1], and SR2MUX to output valuefrom the REGFILE. Each register has a width of 16 bits.

The additional control signals are:

GateTEMP1/1: NO, YES

GateTEMP2/1: NO, YES

LD.TEMP1/1: NO, LOAD

LD.TEMP2/1: NO, LOAD

ALUK/3: OR1 (A |0x1), XOR (A ˆB), LSHF1 (A<<1), PASSA, PASS0 (Pass value 0), PASS16(Pass value 16)

Reg IN MUX/2: BUS (passes value from BUS), EQ0 (passes the value from the ==0? com-parator). BUS is asserted if this signal is not specified.

COND/4:COND0000 ;UnconditionalCOND0001 ;Memory ReadyCOND0010 ;BranchCOND0011 ;Addressing modeCOND0100 ;Mystery 1COND1000 ;Mystery 2 (which is set based on the 0th bit of TEMP1)

The microcode for the instruction is given in the table on the next page.

19/26

Initials:

State Cond J Asserted Signals001010 (10) COND0000 001011 ALUK = PASS0, GateALU, LD.REG,

DRMUX = DR (IR[11:9])001011 (11) COND0000 101000 ALUK = PASSA, GateALU, LD.TEMP1,

SR1MUX = SR1 (IR[8:6])101000 (40) COND0000 100101 ALUK = PASS16, GateALU, LD.TEMP2100101 (37) COND1000 101101 ALUK = LSHF1, GateALU, LD.REG,

SR1MUX = DR, DRMUX = DR (IR[11:9])111101 (61) COND0000 101101 ALUK = OR1, GateALU, LD.REG,

SR1MUX = DR, DRMUX = DR (IR[11:9])101101 (45) COND0000 111111 GateTEMP1, LD.TEMP1111111 (63) COND0100 100101 GateTEMP2, LD.TEMP2110101 (53) COND0000 010010 GateALU, ALUK = XOR,

SR1MUX = DR (IR[11:9])LD.REG, DRMUX = DR (IR[11:9]), Reg IN MUX = EQ0

Describe what this instruction does.

Determines if the 16-bit value stored in SR1 is a Palindrome itself.

Code:

State 10: DR ← 0State 11: TEMP1 ← value(SR1)State 40: TEMP2 ← 16State 37: DR = DR << 1

if (TEMP1[0] == 0)goto State 45

elsegoto State 61

State 61: DR = DR | 0x1State 45: TEMP1 = TEMP1 >> 1State 63: DEC TEMP2

if (TEMP2 == 0)goto State 53

elsegoto State 37

State 53: DR = DR ˆ SR1

20/26

Initials:

MEMORY

OUTPUTINPUT

KBDR

ADDR. CTL.

LOGIC

MDR

INMUX

MAR LD.MAR

LD.MDR

DATA.SIZE

MAR[0]

MAR[0]

DATA.SIZE

R

2

KBSR

MEM.EN

R.W

MIO.EN

GatePCGateMARMUX

16 16 16 16

16 16

16

LD.CC

SR2MUX

SEXT

SEXT[8:0]

[10:0]

SEXT

SEXT[5:0]

16

+2

PCLD.PC

16

+

16

16

16

16 16

ZEXT & LSHF1

[7:0] 2

ADDR2MUX

LSHF1

[4:0]

GateALU

16

SHF

GateSHF

6IR[5:0]

16

1616

16

16

16LOGIC

GateMDR

N Z P

SR2 OUT

SR1 OUT

REG

FILE

MARMUX

16 16

3

0

R

3

3

ALUALUK

2 AB

ADDR1MUX

PCMUX2

SR1

DR

SR2

LD.REG

IRLD.IR

CONTROL

DDR

DSR

MIO.EN

LOGIC

LOGIC

SIZEDATA.

WE1 WE0

[0]

WE

LOGIC

== 0 ?

MysterySIGNAL 1

== 0 ?

16

TEMP216LD.TEMP2

DECBY 1

GateTEMP2

TEMP116

LD.TEMP1

RSHFBY 1

GateTEMP1

[0]

MYSTERY SIGNAL 2

REG_INMUX

21/26

Initials:

22/26

Initials: C.4. THE CONTROL STRUCTURE 11DRIR[11:9]111DRMUX(a) SR1SR1MUXIR[11:9]IR[8:6] (b)Logic BENPZNIR[11:9] (c)Figure C.6: Additional logic required to provide control signalsLC-3b to operate correctly with a memory that takes multiple clock cycles to read orstore a value.Suppose it takes memory five cycles to read a value. That is, once MAR containsthe address to be read and the microinstruction asserts READ, it will take five cyclesbefore the contents of the specified location in memory are available to be loaded intoMDR. (Note that the microinstruction asserts READ by means of three control signals:MIO.EN/YES, R.W/RD, and DATA.SIZE/WORD; see Figure C.3.)Recall our discussion in Section C.2 of the function of state 33, which accessesan instruction from memory during the fetch phase of each instruction cycle. For theLC-3b to operate correctly, state 33 must execute five times before moving on to state35. That is, until MDR contains valid data from the memory location specified by thecontents of MAR, we want state 33 to continue to re-execute. After five clock cycles,th h l t d th “ d ” lti i lid d t i MDR th23/26

Initials:

C.2. THE STATE MACHINE 5R PC<! BaseR To 1812 To 18To 18RRTo 18To 18 To 18 MDR<! SR[7:0]MDR <! MIR <! MDRRDR<! SR1+OP2*set CCDR<! SR1&OP2*set CC [BEN]PC<! MDR 3215 0 01To 18To 18 To 18R R [IR[15:12]]2830R7<! PCMDR<! M[MAR]set CC BEN<! IR[11] & N + IR[10] & Z + IR[9] & P9DR<! SR1 XOR OP2* 4 22To 111011JSR JMP BR1010 To 10 2120 0 1LDBMAR<! B+off6set CCTo 18 MAR<! B+off6DR<! MDRset CCTo 18MDR<! M[MAR]2527 3762 STW STBLEASHFTRAPXORANDADDRTITo 8set CC set CCDR<! PC+LSHF(off9, 1)14 LDWMAR<! B+LSHF(off6,1) MAR<! B+LSHF(off6,1)PC<! PC+LSHF(off9,1)3335DR<! SHF(SR,A,D,amt4)NOTESB+off6 : Base + SEXT[offset6] RMDR<! M[MAR[15:1]’0]DR<! SEXT[BYTE.DATA]R 2931 18, 19 MDR<! SRTo 18R RM[MAR]<! MDR 1623 R R17To 19 24M[MAR]<! MDR**MAR<! LSHF(ZEXT[IR[7:0]],1)15To 18PC+off9 : PC + SEXT[offset9] MAR <! PCPC <! PC + 2*OP2 may be SR2 or SEXT[imm5]** [15:8] or [7:0] depending onMAR[0] [IR[11]]PC<! BaseRPC<! PC+LSHF(off11,1)R7<! PC R7<! PC13 Figure C.2: A state machine for the LC-3b24/26

Initials:

Scratchpad

25/26

Initials:

Scratchpad

26/26

cmu 18-447 introduction to computer architecture, spring...

Documents