cs1104: computer organisation cs1104 cs1104 school of computing national university of singapore

49
CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs110 4 School of Computing National University of Singapore

Upload: salvatore-keeler

Post on 14-Dec-2015

239 views

Category:

Documents


1 download

TRANSCRIPT

CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104

School of ComputingNational University of

Singapore

CS1104-P2-6 Processor: Datapath and Control 2

PII Lecture 6: Processor: Datapath and Control

Datapath: Single-bus Organization Multiple-bus Organization

MIPS: Multicycle Datapath and Control Stages of Instructions Datapath Walkthroughs

Processor and Logic Design

CS1104-P2-6 Processor: Datapath and Control 3

PII Lecture 6: Processor: Datapath and Control

Reading: Chapter 9 of textbook, which is Chapter 7 in

“Computer Organization” by Hamacher, Vranesic and Zaky.

Optional reading: Chapter 5 in “Computer Organization & Design” by Patterson and Hennessy.

CS1104-P2-6 Processor: Datapath and Control 4

Datapath

CS1104-P2-6 Processor: Datapath and Control 5

Recap: Organisation

Processor

Control

Datapath

Memory Devices

Input

Output

Cache

Registers

Bus

CS1104-P2-6 Processor: Datapath and Control 6

Fundamental Concepts

Processor (CPU): the active part of the computer, which does all the work (data manipulation and decision-making).

Datapath: portion of the processor which contains hardware necessary to perform all operations required by the computer (the brawn).

Control: portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain).

CS1104-P2-6 Processor: Datapath and Control 7

Fundamental Concepts (2)

Instruction execution cycle: fetch, decode, execute. Fetch: fetch next

instruction (using PC) from memory into IR.

Decode: decode the instruction.

Execute: execute instruction.

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

CS1104-P2-6 Processor: Datapath and Control 8

Fundamental Concepts (3)

Fetch: Fetch next instruction into IR (Instruction Register). Assume each word is 4 bytes and each instruction

is stored in a word, and that the memory is byte addressable.

PC (Program Counter) contains address of next instruction.

IR [[PC]]PC [PC] + 4

CS1104-P2-6 Processor: Datapath and Control 9

Single-bus Organization

Data line

Address line

PC

MAR

MDR

Y

Internal processor bus

Memory bus

Z

MUX

A

ALU

B

Constant 4

Select

AddSub

XOR

:ALU

control lines Carry-in

IR

RO

R(n–1)

::

TEMP

Instruction decoder

and control logic

. . .

Control signals

CS1104-P2-6 Processor: Datapath and Control 10

Instruction Execution

An instruction can be executed by performing one or more of the following operations in some specified sequence: Transfer a word of data from one register to

another or to the ALU (Arithmetic Logic Unit). Perform an arithmetic or a logic operation and

store the result in a register. Fetch the contents of a given memory location and

load them into a register. Store a word of data from a register into a given

memory location.

CS1104-P2-6 Processor: Datapath and Control 11

Register Transfer

Register to register transfer: For each register Ri, two control signals:

Riin used to load the data on the bus into the register.

Riout to place the register’s contents on the bus.

Example: To transfer contents of R1 to R4: Set R1out to 1. This places contents of R1 on the bus.

Set R4in to 1. This loads data from the processor bus into R4.

CS1104-P2-6 Processor: Datapath and Control 12

Register Transfer (2)

Y

Internal processor bus

Z

MUX

A

ALU

B

Constant 4

Select

Ri

X

Ri in

X

Ri out

Y in

X

X

Z in

Z out

X

CS1104-P2-6 Processor: Datapath and Control 13

Arithmetic/Logic Operation

ALU: Performs arithmetic and logic operations on its A and B inputs.

To perform R3 [R1] + [R2]:1. R1out, Yin

2. R2out, SelectY, Add, Zin

3. Zout, R3in

Y

Internal processor bus

Z

MUX

A

ALU

B

Constant 4

Select

Ri

X

Ri in

X

Ri out

Y in

X

X

Z in

Z out

X

CS1104-P2-6 Processor: Datapath and Control 14

Arithmetic/Logic Operation (2)

If there are n operations, do we need n ALU control lines?

We could use encoding, which requires log2 n control lines for n operations. However, this will increase complexity and hardware (additional decoder needed).

A

ALU

BAddSub

XOR

:ALU

control lines Carry-in

CS1104-P2-6 Processor: Datapath and Control 15

Reading a Word from Memory

Move (R1), R2 /* R2 [[R1]]1. MAR [R1]2. Start a Read operation on the memory bus3. Wait for the MFC response from the memory4. Load MDR from the memory bus5. R2 [MDR]

MDR has four control signals: MDRin, MDRout, MDRinE and MDRoutE. Memory-bus

data lines

MDR

X

MDR inE

X

MDR outE

Internal processor bus

X

MDR in

X

MDR out

CS1104-P2-6 Processor: Datapath and Control 16

Reading a Word from Memory (2)

Move (R1), R2 /* R2 [[R1]]

Sequence of control steps:1. R1out, MARin, Read

2. MDRinE, WMFC

3. MDRout, R2in

WMFC: Wait for arrival of MFC (Memory-Function-Completed) signal.

MFC: To accommodate variability in response time, the processor waits until it receives an indication that the Read/Write operation has been completed. The addressed device sets MFC to 1 to indicate this.

CS1104-P2-6 Processor: Datapath and Control 17

Storing a Word in Memory

Move R2, (R1) /* [R1] [R2] Sequence of control steps:

1. R1out, MARin

2. R2out, MDRin, Write

3. MDRoutE, WMFC

CS1104-P2-6 Processor: Datapath and Control 18

Executing a Complete Instruction

Add (R3), R1 /* R1 [R1] + [[R3]] Adds the contents of a memory location pointed to by

R3 to register R1. Sequence of control steps:

1. PCout, MARin, Read, Select4, Add, Zin

2. Zout, PCin, Yin, WMFC

3. MDRout, IRin

4. R3out, MARin, Read

5. R1out, Yin, WMFC

6. MDRout, SelectY, Add, Zin

7. Zout, R1in, End

Steps 1 – 3: Instruction

fetch

CS1104-P2-6 Processor: Datapath and Control 19

Multiple-Bus Organization

Single-bus structure: Control sequences are long as only one data item can be transferred over the bus in a clock cycle.

Figure on next slide shows a three-bus structure. All registers are combined into a single block called

register file with three ports: 2 outputs allowing 2 registers to be accessed simultaneously and have their contents put on buses A and B, and 1 input allowing data on bus C to be loaded into a third register.

Buses A and B are used to transfer source operands to the A and B inputs of ALU, and result transferred to destination over bus C.

CS1104-P2-6 Processor: Datapath and Control 20

Multiple-Bus Organization (2)Bus C

Constant 4

Bus A Bus B

PC

Register file

MU

X

Incrementer

AALU

B

R

Address line

Memory bus data lines

Bus CBus A Bus B

MAR

MDR

IR

Instruction decoder

CS1104-P2-6 Processor: Datapath and Control 21

Multiple-Bus Organization (3)

For the ALU, R=A (or R=B) means that its A (or B) input is passed unmodified to bus C.

Add R4, R5, R6 /* R6 [R4] + [R5] Adds the contents of R4 and R5 to R6.

Sequence of control steps:1. PCout, R=B, MARin, Read, IncPC

2. WMFC

3. MDRoutB, R=B, IRin

4. R4outA, R5outB, SelectA, Add, R6in, End

CS1104-P2-6 Processor: Datapath and Control 22

Control

Hardwired control or microprogrammed control. Hardwired control:

Memory bus data lines

Control signals

Clock

. . .

CLK

::

:

:

. . .

IRDecoder/ encoder

External inputs

Condition codes

Control step counter

CS1104-P2-6 Processor: Datapath and Control 23

Control (2)

Microprogrammed control: Control signals generated by a program. Control word (CW) is a microinstruction that contains

individual bits that represent the various control signals. Vertical organization: highly encoded schemes that use

compact codes to specify only a small number of control functions in each microinstruction.

Horizontal organization: minimally encoded scheme in which many resources can be controlled with a single microinstructions.

Popular in Complex Instruction Set Architectures (CISC) because complex instruction sets require complex controllers that can more easily be implemented as microprograms.

Memory bus data lines

CS1104-P2-6 Processor: Datapath and Control 24

Control (3)

Example of a horizontal organization scheme:

Memory bus data lines

1. PCout, MARin, Read, Select4, Add, Zin

2. Zout, PCin, Yin, WMFC

3. MDRout, IRin

4. R3out, MARin, Read

5. R1out, Yin, WMFC

6. MDRout, SelectY, Add, Zin

7. Zout, R1in, End

1

2

3

4

5

6

7

0

1

0

0

0

0

0

1

0

0

0

0

0

0

1

0

0

1

0

0

0

1

0

0

1

0

0

0

0

0

1

0

0

1

0

0

0

1

0

0

0

0

0

1

0

0

1

0

0

1

0

0

0

0

0

0

1

0

0

0

0

1

0

1

0

0

0

0

1

0

0

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

1

0

0

0

1

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

1

PC

in

PC

ou

t

End

MA

Rin

Rea

d

IRjn

Yin

Sel

ect

MD

Ro

ut

Zo

ut

Zin

R1

ou

t

R1

in

Add

R3

ou

t

WM

FC

Mic

ro-

inst

ruct

ion

.. ..

Select=0: SelectYSelect=1: Select4

CS1104-P2-6 Processor: Datapath and Control 25

MIPS: Multicycle Datapath and Control

Adapted from D. Patterson’s CS61C

http://www.cs.berkeley.edu/~pattrsn/61CF00

Copyright 2000 UCB

CS1104-P2-6 Processor: Datapath and Control 26

Stages of a Datapath

Problem: a single, atomic block which “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient.

Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath. Smaller stages are easier to design. Easy to optimize (change) one stage without

touching the others.

CS1104-P2-6 Processor: Datapath and Control 27

Stages of a Datapath (2)

There is a wide variety of MIPS instructions: so what general steps do they have in common?

Stages 1. Instruction Fetch2. Instruction Decode3. ALU4. Memory Access5. Register Write

CS1104-P2-6 Processor: Datapath and Control 28

Stages of a Datapath (3)

Stage 1: Instruction Fetch. No matter what the instruction is, the 32-bit

instruction word must first be fetched from memory (the cache-memory hierarchy).

Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction; byte addressing so + 4).

CS1104-P2-6 Processor: Datapath and Control 29

Stages of a Datapath (4)

Stage 2: Instruction Decode Upon fetching the instruction, we next gather data

from the fields (decode all necessary instruction data).

First, read the opcode to determine instruction type and field lengths.

Second, read in data from all necessary registers. For add, read two registers. For addi, read one register. For jal, no read necessary.

CS1104-P2-6 Processor: Datapath and Control 30

Stages of a Datapath (5)

Stage 3: ALU (Arithmetic-Logic Unit) The real work of most instructions is done here:

arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt).

What about loads and stores? lw $t0, 40($t1) The address we are accessing in memory =

the value in $t1 plus the value 40. We do this addition at this stage.

CS1104-P2-6 Processor: Datapath and Control 31

Stages of a Datapath (6)

Stage 4: Memory Access Actually only the load and store instructions do

anything during this stage; for the other instructions, they remain idle during this stage.

Since these instructions have a unique step, we need this extra stage to account for them.

As a result of the cache system, this stage is expected to be just as fast (on average) as the others.

CS1104-P2-6 Processor: Datapath and Control 32

Stages of a Datapath (7)

Stage 5: Register Write Most instructions write the result of some

computation into a register. Examples: arithmetic, logical, shifts, loads, slt What about stores, branches, jumps?

They do not write anything into a register at the end.

These remain idle during this fifth stage.

CS1104-P2-6 Processor: Datapath and Control 33

Datapath: Generic Steps

PC

inst

ruct

ion

me

mor

y+4

rtrs

rd

regi

ste

rs

ALU

Da

tam

em

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute 4. Memory 5. Reg. Write

CS1104-P2-6 Processor: Datapath and Control 34

Datapath Walkthroughs: add

add $r3,$r1,$r2 # r3 = r1+r2 Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find that it is an add

instruction, then read registers $r1 and $r2. Stage 3: Add the two values retrieved in stage 2. Stage 4: Idle (nothing to write to memory). Stage 5: Write result of stage 3 into register $r3.

CS1104-P2-6 Processor: Datapath and Control 35

Datapath Walkthroughs: add (2)

PC

inst

ruct

ion

me

mor

y

+4

regi

ste

rs

ALU

Da

tam

em

ory

imm

2

1

3

ad

d r

3, r

1, r

2

reg[1]+reg[2]

reg[2]

reg[1]

CS1104-P2-6 Processor: Datapath and Control 36

Datapath Walkthroughs: slti

slti $r3,$r1,17 Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find it is an slti, then read

register $r1. Stage 3: Compare value retrieved in stage 2 with

the integer 17. Stage 4: Go idle. Stage 5: Write the result of stage 3 in register $r3.

CS1104-P2-6 Processor: Datapath and Control 37

Datapath Walkthroughs: slti (2)

PC

inst

ruct

ion

me

mor

y

+4

regi

ste

rs

ALU

Da

tam

em

ory

imm

3

1

x

slt

i r3

, r1

, 17

reg[1]-17

17

reg[1]

CS1104-P2-6 Processor: Datapath and Control 38

Datapath Walkthroughs: sw

sw $r3, 20($r1) Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find it is an sw, then read

registers $r1 and $r3. Stage 3: Add 20 to value in register $r1 (retrieved

in stage 2). Stage 4: Write value in register $r3 (retrieved in

stage 2) into memory address computed in stage 3.

Stage 5: Go idle (nothing to write into a register).

CS1104-P2-6 Processor: Datapath and Control 39

Datapath Walkthroughs: sw (2)

PC

inst

ruct

ion

me

mor

y

+4

regi

ste

rs

ALU

Da

tam

em

ory

imm

3

1

x

sw

r3

, 20

(r1)

reg[1]+20

20

reg[1]

ME

M[r

1+

20]<

-r3

reg[3]

CS1104-P2-6 Processor: Datapath and Control 40

Why Five Stages?

Could we have a different number of stages? Yes, and other architectures do.

So why does MIPS have five stages, if instructions tend to go idle for at least one stage? There is one instruction that uses all five stages:

the load.

CS1104-P2-6 Processor: Datapath and Control 41

Datapath Walkthroughs: lw

lw $r3, 40($r1) Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find it is a lw, then read

register $r1. Stage 3: Add 40 to value in register $r1 (retrieved

in stage 2). Stage 4: Read value from memory address

compute in stage 3. Stage 5: Write value found in stage 4 into register $r3.

CS1104-P2-6 Processor: Datapath and Control 42

Datapath Walkthroughs: lw (2)

PC

inst

ruct

ion

me

mor

y

+4

regi

ste

rs

ALU

Da

tam

em

ory

imm

3

1

x

lw r

3, 4

0(r

1)

reg[1]+40

40

reg[1]

r3<

-ME

M[r

1+

40

]

reg[3]

CS1104-P2-6 Processor: Datapath and Control 43

What Hardware Is Needed?

PC: a register which keeps track of address of the next instruction.

General Purpose Registers Used in stages 2 (read) and 5 (write). We are currently working with 32 of these.

Memory Used in stages 1 (fetch) and 4 (R/W). Cache system makes these two stages as fast as

the others, on average.

CS1104-P2-6 Processor: Datapath and Control 44

Datapath: Summary

Construct datapath based on register transfers required to perform instructions.

Control part causes the right transfers to happen.P

C

inst

ruct

ion

me

mor

y

+4

rtrs

rd

regi

ste

rs

ALU

Da

tam

em

ory

imm

Controller

opcode, funct

CS1104-P2-6 Processor: Datapath and Control 45

Where is Logic Design Used?

Combinational circuits for ALU and other parts of the datapath.

Different control signals are needed for different clock cycles and different instructions for the ALU, registers and other parts of the datapath. Sequential circuits.

ALU

ALU Control

CS1104-P2-6 Processor: Datapath and Control 46

Where is Logic Design Used? (2)

High-level view of finite state machine control. Sequential logic design can be used to assert the

correct control signals at the correct times.

Start

Instruction fetch/decode and register fetch

Memory accessinstructions

R-type instructions

Branchinstruction

Jumpinstruction

CS1104-P2-6 Processor: Datapath and Control 47

Summary

Datapath is the hardware that performs operations necessary to execute programs.

Control instructs datapath on what to do next. Datapath needs:

access to storage (general purpose registers and memory)

computational ability (ALU) helper hardware (local registers and PC)

CS1104-P2-6 Processor: Datapath and Control 48

Summary (2)

Five stages of datapath (executing an instruction): 1: Instruction Fetch (Increment PC) 2: Instruction Decode (Read Registers) 3: ALU (Computation) 4: Memory Access 5: Write to Registers

• ALL instructions must go through ALL five stages.

• Datapath designed in hardware.

CS1104-P2-6 Processor: Datapath and Control 49

End of file