computer organization chapter 4

53
Computer Organization Chapter 4 Prof. Qi Tian Fall 2013 1

Upload: aldona

Post on 23-Feb-2016

93 views

Category:

Documents


2 download

DESCRIPTION

Computer Organization Chapter 4. Prof. Qi Tian Fall 2013. Topics. Dec. 6 (Friday) Final Exam Review Record Check Dec. 4 (Wednesday) 5 variable Karnaugh Map Quiz 5 Dec . 2 (Monday) 3, 4 variables Karnaugh Map Reminder: Assignment 6 is due (extended) on Wednesday Dec 4. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computer Organization  Chapter 4

1

Computer Organization Chapter 4

Prof. Qi TianFall 2013

Page 2: Computer Organization  Chapter 4

2

Topics• Dec. 6 (Friday)

– Final Exam Review– Record Check

• Dec. 4 (Wednesday)– 5 variable Karnaugh Map– Quiz 5

• Dec. 2 (Monday)– 3, 4 variables Karnaugh Map– Reminder:

• Assignment 6 is due (extended) on Wednesday Dec 4.• Last quiz on Wednesday Dec 4 • Final exam review on Friday. • Course evaluation on ASAP by Dec 2.

Page 3: Computer Organization  Chapter 4

3

Topics

• Nov. 27 (Wednesday)– Minimum sum-of-product solution– 2 variable Karnaugh map

• Nov. 25 (Monday)– Truth Table– Minterm and Maxterm

• Nov. 22 (Friday)– Practice Problems 4.2– Digital Logic Design

• Function Complete

Page 4: Computer Organization  Chapter 4

4

Topics

• Nov. 20 (Wednesday)– Midterm Exam Two– Practice Problems 4.1

• Nov. 18 (Monday)– Guest Lecture by Prof. Dakai Zhu

• Nov. 13 (Wednesday)– Y86 Instruction Set– Slides 1-13

Page 5: Computer Organization  Chapter 4

5

Section 4.1 The Y86 Instruction Set Architecture

• We will look at an assembly language set Y86– Simpler than IA32 but similar to it

• Compared to IA32, Y86 has fewer data types, instructions, and addressing modes.

• Y86 is inspired by IA32 instruction set, which is colloquially referred to as “x86”

– Understand how it is encoded, and how you would build hardware to implement it.

Page 6: Computer Organization  Chapter 4

6

Section 4.1.1 Programmer-Visible State

%eax %esi

%ecx %edi

%edx %esp

%ebx %ebp

ZF SF OF

RF: Program RegistersCC:

Condition codes

PC

Stat: Program status

DMEM: Memory

The Y86• 8 32-bit registers with the same names as the IA32 32-bit registers• 3 condition codes: ZF, SF, OF (not carry flag – interpret integers as signed)• A program counter (PC)• A program status byte: AOK, HLT, ADR, INS• Memory: up to 4 GB to hold program and data

The Y86 does not have• A carry flag• Floating point registers

Page 7: Computer Organization  Chapter 4

7

Section 4.1.1 Programmer-Visible State

%eax %esi

%ecx %edi

%edx %esp

%ebx %ebp

ZF SF OF

RF: Program RegistersCC:

Condition codes

PC

Stat: Program status

DMEM: Memory

• Register %esp is used as stack pointer by the push, pop, call and return instructions.

• Other registers do not have fixed meanings or values.• Single-bit condition codes: ZF, SF, OF, storing information about the effect of the

most recent arithmetic or logical instructions. • The program counter (PC) holds the address of the instruction currently being

executed.• Memory is conceptually a large array of bytes, holding both program and data.• Status code: Stat, indicating the overall state of program execution. It will indicate

either normal operation, or that some sort of exception has occurred.

Page 8: Computer Organization  Chapter 4

8

Section 4.1.2 Y86 instruction

Y86 instruction set• Instruction encodings range

between 1 and 6 bytes• An instruction consists of

— an 1-byte instruction specifier

— Possibly a 1-byte register specifier

— Possibly a 4-byte constant word

• Field fn specifies a particular integer operation (OP1), data movement condition (cmovXX), or branch condition (jXX).

• A numeric values are shown in hexadecimal.

Page 9: Computer Organization  Chapter 4

9

Section 4.1.3 Instruction Encoding

Number Register Name0 %eax1 %ecx2 %edx3 %ebx4 %esp5 %ebp6 %esi7 %ediF No register

•rA or rB represent one of the registers, encoded as follows:

•Different opcodes for 4 types of moves:o(rr) Register to register o(ir) immediate to registero(rm) register to memoryo(mr) memory to register

Page 10: Computer Organization  Chapter 4

10

Section 4.1.3 Instruction Encoding

•The only memory addressing mode is base register + displacement—No second register and scaling factor

• Memory operations always move 4 bytes (no byte or 2 bytes word memory operations

• Source or destination of memory move must be a register.

• The operations supported (OP1) are: fn operation 0 addl 1 subl 2 andl 3 xorl• Only 32-bit operations and no or and no not.• These only take registers as operands and only work on 32bits.

Page 11: Computer Organization  Chapter 4

11

Section 4.1.3 Instruction Encoding

•7 jumps instructions:fn jump 0 jmp 1 jle 2 jl 3 je 4 jne 5 jge 6 jg

• 6 conditional move instructions with encodings similar to the conditional jump instructions. — Similar to the IA32— Note that rrmovl is a special case.

• You can tell the type of instruction and how many bytes it has by looking at the first byte of the instruction.

Page 12: Computer Organization  Chapter 4

12

Figure 4.3. Function codes for Y86 instruction set

6 0addl

subl

andl

xorl

6 1

6 2

6 3

7 0jmp

jle

jl

je

7 1

7 2

7 3

7 4jne

jge

jg

7 5

7 6

2 0rrmovl

cmovle

cmovl

cmove

2 1

2 2

2 3

2 4cmovne

cmovge

cmovg

2 5

2 6

Operations Branches Moves

• The code specifies a particular integer operation, branch condition, or data transfer condition.

• These instructions are shown as OP1, jXX, and cmovXX in Figure 4.2

Page 13: Computer Organization  Chapter 4

Number Register Name

0 %eax1 %ecx2 %edx3 %ebx4 %esp 5 %ebp6 %esi7 %ediF No register

addl

subl

andl

xorl

jmp

jle

jl

je

jne

jge

jg

rrmovl

cmovle

cmovl

cmove

cmovne

cmovge

cmovg

Operations Moves

fn operation0 addl1 subl2 andl3 xorl

fn jump0 jmp1 jle2 jl3 je4 jne5 jge6 jg

Program register identifiers

7 jump functions

Operations supported

Branches

Summary of Section 4.1.2-4.1.3: Y86 instruction set

2 1

2 2

2 3

2 4

2 5

2 6

2 0

6 1

6 0

6 2

6 3

Page 14: Computer Organization  Chapter 4

14

Section 4.1.2 Y86 instruction

• Y86 is largely a subset of the IA32 instruction set.

• Include only 4-byte integer operations, has fewer addressing modes, and includes a smaller set of operations.

• Since we only use 4-byte data, we can refer to these as “words” without ambiguity.

Page 15: Computer Organization  Chapter 4

15

Instruction Encoding Examples1. rrmovl %eax, %ecx

The encodings are: 2001 This would be stored in 2 bytes of memory, the first

containing 0x20 and the second containing 0x01.

2. rmmovl %ecx, 24(%ebp) The encodings are: 401524000000 The first two bytes are 4015 and the displacement is 0x24. On a little endian machine the next byte would be 0x24

followed by 3 bytes of 0.

Page 16: Computer Organization  Chapter 4

16

Practice Problem 4.1Determine the byte encoding of the Y86 instruction sequences that follows. The line “.pos 0x100” indicates that the starting address of the object code should be 0x100.

.pos 0x100 # start code at address 0x100 irmovl $15, %ebx # load 15 into %ebx rrmovl %ebx, %ecx # copy 15 to %ecxloop: # loop rmmovl %ecx, -3(%ebx) # save %ecx at address 15-3=12 addl %ebx, %ecx # increment %ecx by 15 jmp loop # Goto loop

Page 17: Computer Organization  Chapter 4

17

Practice Problem 4.1 - SolutionDetermine the byte encoding of the Y86 instruction sequences that follows. The line “.pos 0x100” indicates that the starting address of the object code should be 0x100.

.pos 0x100 # start code at address 0x100 irmovl $15, %ebx # load 15 into %ebx 0x100: 30f30f000000 rrmovl %ebx, %ecx # copy 15 to %ecx 0x106: 2031loop: # loop 0x108: rmmovl %ecx, -3(%ebx) # save %ecx at address 15-3=12 0x108: 4013fdffffff addl %ebx, %ecx # increment %ecx by 15 0x10e: 6031 jmp loop # Goto loop 0x110: 7008010000

Page 18: Computer Organization  Chapter 4

18

Practice Problem 4.2For each byte sequence listed, determine the Y86 instruction sequences it encodes. If there is some invalid byte in the sequence, show the instruction sequence up to that point and indicate where the invalid value occurs. For each sequence, we show that the starting address, then a colon, and then the byte sequence.

A. 0x100: 30f3fcffffff40630008000000

B. 0x200: a06f80080200000030f30a00000090

Page 19: Computer Organization  Chapter 4

19

Practice Problem 4.2 - SolutionFor each byte sequence listed, determine the Y86 instruction sequences it encodes. If there is some invalid byte in the sequence, show the instruction sequence up to that point and indicate where the invalid value occurs. For each sequence, we show that the starting address, then a colon, and then the byte sequence.

A. 0x100: 30f3fcffffff40630008000000 0x100: irmovl $-4, %ebx 0x106: rmmovl %esi, 0x800(%ebx)

Note: -4 = fffffffc 0x10c: halt B. 0x200: a06f80080200000030f30a00000090 0x200: pushl %esi

0x202: call proc 0x207: halt 0x208: proc 0x208: irmovl $10, %ebx 0x20e: ret

Page 20: Computer Organization  Chapter 4

20

Y86 vs IA32• Encodings of the Y86 are simpler than the IA32, but not as compact.• IA 32 is sometimes labeled as CISC and is deemed to be the opposite of RISC.• RISC and CISC

– RISC = reduced instruction set computer– CISC = complex instruction set computer– Basic ideas of RISC

• Small number of instructions• Most instructions have the same length• Simple addressing formats• Arithmetic and logical operations only work on registers• Memory operations only move between register and memory• No condition codes: test instructions store results in registers.

– Long controversy between RISC and CISC since 1980’s (Read textbook pp. 342-344)– Which is better? Answer: A combination

– Which is Y86? It includes both RISC and CISC• On the CISC side, it has conditional codes, variable-length instructions, and stack-intensive procedure linkages. • On the RISC side, it uses a load-store architecture and a regular encoding.• Taking IA32 and simplifying it by applying the principle of RISC.

Page 21: Computer Organization  Chapter 4

21

Section 4.1.4 Y86 Exceptions

• What happens when an invalid assembly instruction is found?– This generates an exception.– In Y86 an exception halts the machine, it stops executing.– What are some possible causes of exceptions?

• Invalid operation• Divide by 0• Sqrt of negative number• Memory access error (e.g., address too large)• Hardware error

Page 22: Computer Organization  Chapter 4

22

Section 4.1.4 Y86 Exceptions

Value Name Meaning

1 AOK Normal operation

2 HLT Halt instruction encountered

3 ADR Invalid address encountered

4 INS Invalid instruction encountered

Y86 status codes. In our design, the processor halts for any code other than AOK

Page 23: Computer Organization  Chapter 4

23

Y86 Examples• Example 1:

– IA32 addl (%ecx), %eax

– Y86:• Cannot be finished in one instruction• 2 instructions to implement:

mrmovl (%ecx), %esi addl %esi, %eax

• Example 2:– IA 32: addl $4, %ecx– Y86: irmovl $4, %ebx addl %ebx, %ecx

• Example 3: – IA 32: addl (%ebx, %edx, 4), %eax– Y86: How many Y86 instructions are needed to do this?

Page 24: Computer Organization  Chapter 4

24

Section 4.2 Logic Design

• Section 4.2.1 Logic gates

– Logic gate: • simplest building block, 1-2 inputs and 1 output; • Boolean function such as AND, OR, and NOT• Hardware Description Language (HDL)

– Currently, circuits are designed using a HDL.– Much like a C code: for example, an AND gate is represented by a && b

AND OR NOT

Page 25: Computer Organization  Chapter 4

25

Section 4.2.2 Combinational Circuits

• Combinational Circuits– No memory vs. clocked sequential circuits, has

memory– Building blocks: logic gates– Design an economic circuit

• Algebraic methods for simplication• Karnaugh maps

Alternative way

Page 26: Computer Organization  Chapter 4

26

Section 4.2.2 Combinational Circuits

• Example: bit equal1) bool eq = (a&&b) || (!a && !b)

Alternative way

Page 27: Computer Organization  Chapter 4

27

Section 4.2.2 Combinational Circuits

• A block diagram:

• We can make a multi-bit equal out of 1-bit equals

• Here is a block diagram

Page 28: Computer Organization  Chapter 4

28

Example: 1-bit multiplexer• It allows you to select one of two one-bit inputs (data

selector) and is described by:

bool out = (s && a) || (!s && b)

s = 1, out = a;s = 0, out = b;

•Here is a block diagram

Page 29: Computer Organization  Chapter 4

29

Example: a multi-bit multiplexer•We can make a multi-bit (word level) mux out of 1-bit

muxes:

HCL descriptions of the mux:

Int Out = [ s: A; 1: B;];

[…] is like a select, it means if s is true, the result is A. Otherwise, we check the next case. 1 is always true, so we select B.

Page 30: Computer Organization  Chapter 4

30

Example: 4-word MUX

• Here is a 4 word mux (4-way mux)HCL description:

int Out4 = [ !s1 && !s0 : A; !s1 : B; !s0 : C; 1 : D;];

s1s0 out00 A01 B10 C11 D

Question: How many control inputs would be needed for a 7-way mux?

Page 31: Computer Organization  Chapter 4

31

Other Gates and Basic Building Blocks

• XOR gate:

Out = a^b = !a && b || a && !b

Page 32: Computer Organization  Chapter 4

32

Function Complete• Function complete:

– Any circuits can be made from and, or, and not gates can also be made just using and and not gates; or or and not gates

– Because: a || b = ! (!a && !b); a && b = ! (!a || !b)– The function complete sets: (and, or, not), (and, or), (or, not)– Any single gates can be used as functionally complete sets? Ans: Yes, they are NAND () gate and NOR (|) gate.

Questions: Prove {}, and {|} are function complete.

Page 33: Computer Organization  Chapter 4

33

Function Complete

• Proof of functional complete for NAND {}

• Proof of functional complete for NOR {|}

Page 34: Computer Organization  Chapter 4

34

Adders• 1-bit Half Adder: 2 inputs (A, B) and 2 outputs (S, C)

A B C S Note

0 0 0 0 0+0=1

0 1 0 1 0+1=1

1 0 0 1 1+0=1

1 1 1 0 1+1=2

S = A^BC = A&&B

Truth Table

Page 35: Computer Organization  Chapter 4

35

1-bit Full Adders

•1-bit Full Adder: 3 inputs (A,B,Cin) and 2 outputs (S, Cout)

A B Cin Cout S

0 0 0 0 0

0 0 1 0 1

0 1 0 0 1

0 1 1 1 0

1 0 0 0 1

1 0 1 1 0

1 1 0 1 0

1 1 1 1 1

Truth Table

Page 36: Computer Organization  Chapter 4

36

1-bit Full Adder

• Assignment 6

AB

AB

A

B

Page 37: Computer Organization  Chapter 4

37

Class NotesTopics:• Minterm mi

• Maxterm Mi

• Standard sum-of-product• Standard product-of-sum• Karnaugh-Map

– Minimum sum-of-product– Minimum product-of-sum

Note: See class notes in the course web page under Resources Link.

Page 38: Computer Organization  Chapter 4

38

Karnaugh Maps

• Design:– Start from Truth table => Karnaugh Maps =>

Boolean expressions• Kaunaugh Map

– Useful tool for simplifying and manipulating switching functions of three or four variables.

– Similar to truth tables, but in different representation.

Page 39: Computer Organization  Chapter 4

39

4-bit Full Adder

• 4-bit full adder which takes as input two 4-bit number and a carry coming in and produce a 5 bits of output.– Input: 9 bits– Output: 5 bits

• How to design it?– Using Truth table? How big is it?– Not efficient for Kaunaugh-Map.

Page 40: Computer Organization  Chapter 4

40

4-bit Full Adder

• 4-bit full adder which takes as input two 4-bit number and a carry coming in and produce a 5 bits of output.

• Can be designed in a cascade way!

Page 41: Computer Organization  Chapter 4

41

A little more about Logic Design

• Propagation Delay– Real gates are made from transistors, voltages are used to represent

Boolean values true (1) and false (0)– A voltage greater than a true-threshold is true, and a voltage less than

false-threshold is false.– Voltages between these two thresholds give undefined results.– When you change the input from high to low, it takes some time,

called the propagation delay, or gate delay, for the output voltage to reach its correct value.

– Propagation delay determines how fast your CPU can run.

Page 42: Computer Organization  Chapter 4

42

ALU (Arithmetic Logic Unit)• An ALU is a circuit that can produce one of several arithmetic

(add, subtract, etc.) or logical (and, or, etc.) functions.

Block diagram of this ALU

ALU Design

Page 43: Computer Organization  Chapter 4

43

Section 4.2.5 Memory and Clocking

• So far, we have talked about combinational circuits

• Clocked Sequential Circuits:– Has memory; clock input– Flip-Flops

• S-R Flip-Flop, D Flip-flop, J-K Flip-Flop, T Flip-Flop,

edge-triggered D Flip-Flop and the building block of a multi-bit register

Page 44: Computer Organization  Chapter 4

44

Section 4.3.1 Organizing Processing Steps into Stages

• SEQ: a “sequential” processor• Processing an instruction involves a number of

operations, and we organize them in a particular sequence of stages, attempting to make all instructions follow a uniform sequence.– Design a processor that makes best use of the

hardware.

Page 45: Computer Organization  Chapter 4

45

SEQ Hardware Structure

• The computations required to implement all of the Y86 instructions can be organized into six basic stages: fetch, decode, execute, memory, write back, and PC update.

• See Figure 4.22 for a better quality.

Page 46: Computer Organization  Chapter 4

46

An Informal Description• Fetch

– Read the instruction into memory using the address in the PC.

• Decode– If possible, read the values from the register file and set valA and valB.– The registers are specified by rA and rB except for push and pop which use %esp in place of rB.

• Execute– What it does depends on the icode.– Some instructions feed values into the ALU to obtain a valE and possibly set the condition codes. e.g., OP1, rmmovl, mrmovl– Some instructions will check the condition codes and change the valP.

• Memory– May read from or write to memory

• Write back– May write up to two values to the register file.– Pop will update both the stack pointer and the register popped into.

• PC Update– PC is set valP.

Page 47: Computer Organization  Chapter 4

47

Sample Y86 instruction sequence

Stage OP1 rA, rB rrmovl rA, rB irmovl V, rBFetch icode:ifun M1[PC]

rA:rB M1[PC+1]valP PC +2

icode:ifun M1[PC]rA:rB M1[PC+1]valP PC +2

icode:ifun M1[PC]rA:rB M1[PC+1]valC M4[PC+2]valP PC +6

Decode valA R[rA]valB R[rB]

valA R[rA]

Execute valE valB OP valASet CC

valE 0 + valA valE 0 + valC

MemoryWrite back R[rB] valE R[rB] valE R[rB] valEPC update PC valP PC valP PC valP

Figure 4. 18 Computations in sequential implementation of Y86 instruction OP1, rrmovl, irmovl.OP1: integer and logical operations; rrmovl (register-to-register move) and irmovl (immediate-to-register move)

Page 48: Computer Organization  Chapter 4

48

Sample Y86 instruction sequence1. 0x000: 30f209000000 | irmovl $9, %edx2. 0x006: 30f315000000 | irmovl $21, %ebx3. 0x00c: 6123 | subl %edx, %ebx 4. 0x00e: 30f480000000 | irmovl $128, %esp5. 0x014: 404364000000 | rmmovl %esp, 100(%ebx)6. 0x01a: a02f | pushl %edx7. 0x01c: b00f | popl %eax8. 0x01e: 7328000000 | je done9. 0x023: 8029000000 | call proc10. 0x028: | done:11. 0x028: 00 | halt12. 0x029: | proc:13. 0x029: 90 | ret

Questions: We will trace the processing of these instructions.

Page 49: Computer Organization  Chapter 4

49

Practice Problem• Fill-in the right-hand column of the following table to describe the processing of the irmovl

instruction online 4 of the object code in previous slide.

Stage Genericirmovl V, rB

Specificirmovl $128, %esp

Fetch icode:ifun M1[PC]rA:rB M1[PC+1]valC M4[PC+2]valP PC +6

DecodeExecute valE 0 + valCMemoryWrite back R[rB] valEPC update PC valP

Page 50: Computer Organization  Chapter 4

50

Practice Problem - solution• Fill-in the right-hand column of the following table to describe the processing of the irmovl

instruction on line 4 of the object code in previous slide.

Stage Genericirmovl V, rB

Specificirmovl $128, %esp

Fetch icode:ifun M1[PC]rA:rB M1[PC+1]valC M4[PC+2]valP PC +6

icode:ifun M1[0x00e]=3:0rA:rB M1[0x00f]=f:4valC M4[PC+2]=128valP PC +6 = 0x014

DecodeExecute valE 0 + valC valE 0 + valC = 0 + 128=128MemoryWrite back R[rB] valE R[rB] 128PC update PC valP PC 0x14

Page 51: Computer Organization  Chapter 4

51

Sample Y86 instruction sequence

Stage rmmovl rA, D(rB) mrmovl D(rB), rAFetch icode:ifun M1[PC]

rA:rB M1[PC+1]valC M4[PC+2]valP PC +6

icode:ifun M1[PC]rA:rB M1[PC+1]valC M4[PC+2]valP PC +6

Decode valA R[rA]valB R[rB] valB R[rB]

Execute valE valB + valC valE valB + valC

Memory M4[valE] valA valM M4[valE] Write back R[rA] valMPC update PC valP PC valP

Figure 4. 19 Computations in sequential implementation of Y86 instruction rmmovl, mrmovl. These instructions read or write memory.

Page 52: Computer Organization  Chapter 4

52

Sample Y86 instruction sequence

Stage pushl rA pop1 rAFetch icode:ifun M1[PC]

rA:rB M1[PC+1]

valP PC + 2

icode:ifun M1[PC]rA:rB M1[PC+1]

valP PC + 2

Decode valA R[rA]valB R[%esp]

valA R[%esp]valB R[%esp]

Execute valE valB + (-4) valE valB + 4

Memory M4[valE] valA valM M4[valA] Write back R[%esp] valE R[%esp] valE

R[rA] valM

PC update PC valP PC valP

Figure 4. 20 Computations in sequential implementation of Y86 instruction pushl, popl. These instructions push and pop the stack.

Page 53: Computer Organization  Chapter 4

53

Sample Y86 instruction sequence

Stage jXX Dest Call Dest retFetch icode:ifun M1[PC]

valC M4[PC+1]valP PC + 5

Icode:ifun M1[PC]

valC M4[PC+1]valP PC + 5

icode:ifun M1[PC]

valP PC + 1

DecodevalB R[%esp]

valA R[%esp]valB R[%esp]

ExecuteCnd Cond(CC, ifun)

valE valB + (-4) valE valB + 4

Memory M4[valE] valP valM M4[valA]Write back R[%esp] valE R[%esp] valE

PC update PC Cnd? valC: valP PC valC PC valM

Figure 4. 21 Computations in sequential implementation of Y86 instruction jXX, call, ret. These instructions cause control transfers.