computer architecture

1

(Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3rd Ed., Morgan Kaufmann, 2007)

Computer ArithmeticComputer Arithmetic

2

COURSE CONTENTSCOURSE CONTENTS Introduction Instructions Computer ArithmeticComputer Arithmetic Performance Processor: Datapath Processor: Control Pipelining Techniques Memory Input/Output Devices

3

COMPUTER COMPUTER ARITHMETICARITHMETIC

Arithmetic Logic Unit (ALU) Fast Adder

4

Foundation Foundation KnowledgeKnowledge

Decimal, Binary, Octal, & Hexadecimal Numbers Signed & Unsigned Numbers 2’s Complement Representation 2’s Complement Negation, Addition, & Subtraction Overflow Sign Extension ASCII vs Binary Boolean Algebra Logic Design Assembly Language

5

NumbersNumbers

Bits are just bits (no inherent meaning) Conventions define relationship between bits and

numbers Binary numbers (base 2)

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...

decimal: 0 . . . 2n – 1 Of course it gets more complicated:

Numbers are finite (overflow) Fractions and real numbers Negative numbers E.g., no MIPS subi instruction; addi can add a negative

number How do we represent negative numbers?

I.e., which bit patterns will represent which numbers?

6

Possible Possible RepresentationsRepresentations

Three representations Sign Magnitude: One's Complement Two's Complement

000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0 100 = -3 100 = -4101 = -1 101 = -2 101 = -3110 = -2 110 = -1 110 = -2111 = -3 111 = -0 111 = -1

Issues: balance, number of zeros, ease of operations

Which one is best? Why?

7

MIPSMIPS

32 bit signed numbers:

0000 0000 0000 0000 0000 0000 0000 0000two = 0ten

0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten

0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten

...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten

0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten

1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten

1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten

1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten

...1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten

1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten

1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten

• maxint: + 2,147,483,647ten

• minint: – 2,147,483,648ten

8

Two’s Complement Two’s Complement OperationsOperations

Negating a two’s complement number: invert all bits and add 1

Remember: “negate” and “invert” are quite different! Converting n bit numbers into numbers with more

than n bits: MIPS 16 bit immediate gets converted to 32 bits for

arithmetic Copy the most significant bit (the sign bit) into the other

bits0010 -> 0000 00101010 -> 1111 1010

“sign extension” (lbu vs. lb)

9

Additional MIPS Additional MIPS InstructionsInstructions

Character transfer lbu $s1, 100($s2) # $s1 memory [$s2+100] (load byte unsigned) sb $s1, 100($s2) # memory [$s2+100] $s1 (store byte)

Conditions sltu $s2, $s3, $s4 # if ($s3) < ($s4) then $s2 1; # else $s2 0 (set on less than, unsigned numbers)

# Note that slt works on 2’ complement numbers Arithmetic on unsigned numbers

addu $s1, $s2, $s3 # $s1 $s2 + $s3 (no overflow detection)subu $s1, $s2, $s3 # $s1 $s2 - $s3 (no overflow detection)

addiu $s1, $s2, 100 # $s1 $s2 + 100 (no overflow detection) MIPS detects overflow with an exception (interrupt), which is an unscheduled

procedure call. MIPS includes a register, called exception program counter (EPC) to contain the address of the instruction that caused the exception

mfc0 $s1, $epc # $s1 $epc (move from special registers)

10

Shift operationssll $t2, $s0, 8 # $t2 $s0<<8 (shift left by constant) srl $s1, $s2, 10 # $s1 $s2>>10 (shift right by constant) Fill the emptied bits with 0’s

Logical operations and $s1, $s2, $s3 # $s1 $s2 and $s3 (bit-by-bit and) or $s1, $s2, $s3 # $s1 $s2 or $s3 (bit-by-bit or) andi $s1, $s2, 100 # $s1 $s2 and 100 ori $s1, $s2, 100 # $s1 $s2 or 100

Op=0 rs=0 rt=16 rd=10 shamt=8 funct=0sll $t2, $s0, 8

Additional MIPS Additional MIPS InstructionsInstructions

11

ALU: Arithmetic Logic ALU: Arithmetic Logic UnitUnit Performs arithmetic (e.g. add)

& logical operations (e.g. and) in CPU

Control Funct Result

000 and A and B

001 or A or B

010 add A + B

110 sub A - B

111 slt 1 if A<B

A

B

ALU operation

Zero

Result

Overflow

32

Carryout

32

32

12

ALU Building BlocksALU Building Blocks

1-bit adder

Gates, multiplexor

cout = a b + a cin + b cin

sum = a b cin

Note: Cin is carryin, cout is carryoutSum

CarryIn

CarryOut

a

b

13

A Simple ALUA Simple ALU

b

0

2

Result

Operation

a

1

CarryIn

CarryOut

Result31a31

b31

Result0

CarryIn

a0

b0

Result1a1

b1

Result2a2

b2

Operation

ALU 0

Carry In

C arryO u t

ALU 1

Carry In

C arryO u t

ALU 2

Carry In

C arryO u t

ALU31

Carry In

A 1-bit ALU that performs AND, OR, and addition (shown below)Building a 32-bit ALU (shown right)

14

Two's complement approach: just negate b and add. How do we negate?

By selecting Binvert = 1, and setting CarryIn =1 in the least significant bit of ALU, we get 2’s complement subtraction a - b

ALU: SubtractionALU: Subtraction

0

2

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b

)1()( bababa

15

To support set-on-less-than instruction (slt)

slt is an arithmetic instruction

produces a 1 if rs < rt and 0 otherwise

use subtraction: (a-b) < 0 implies a < b

use a Set & a Less signal to indicate result

To support test for equality (beq)

use subtraction: (a-b) = 0 implies a = b

ALU: Additional ALU: Additional OperationsOperations

16

ALU: Additional ALU: Additional OperationsOperations

0

3

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b 2

Less

A 1-bit ALU that performs AND, OR, add,

subtract:Less is used for slt instruction (see 32-bit

ALU next slide)

The ALU for the most significant bit:Set is used for slt instruction, it is connected to Less

of lsb (see 32-bit ALU next slide)

Overflow detection needed on msb

0

3

Result

Operation

a

1

CarryIn

0

1

Binvert

b 2

Less

Set(sign)

Overflowdetection Overflow

17

A 32-bit ALU that performs AND, OR,

add, & subtractFor subtract, set Binvert = 1 and CarryIn

=1 (for add or logical operations, both set

to 0)

Can combine Binvert & CarryIn to

Bnegate

Set and Less, together with subtraction,

can be used for slt

Set(sign)a31

0

ALU0 Result0

CarryIn

a0

Result1a1

0

Result2a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Binvert

CarryIn

Less

CarryIn

CarryOut

ALU1Less

CarryIn

CarryOut

ALU2Less

CarryIn

CarryOut

ALU31Less

CarryIn

CarryOut

A 32-bit ALUA 32-bit ALU

18

A Final 32-bit ALUA Final 32-bit ALU

Add a zero detector to test for zero results or equality (e.g. in beq instruction)

Control lines (Operation) (3-bit):

000 = and001 = or010 = add110 = subtract111 = slt

bit1 & bit0 to multiplexors in ALU

bit2 to Bnegate

•Note: zero is a 1 when the result is zero!Set

a31

0

Result0a0

Result1a1

0

Result2a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Bnegate

Zero

ALU0Less

CarryIn

CarryOut

ALU1Less

CarryIn

CarryOut

ALU2Less

CarryIn

CarryOut

ALU31Less

CarryIn

3 bits

(Sign)

CarryOut

1 bit

2 bits

19

ALU Design: SummaryALU Design: Summary

Select building blocks: adders, gates Use multiplexors to select the output we want Perform subtraction using two’s complement Replicate a 1-bit ALU to produce a 32-bit ALU --> regularity Need circuit to detect conditions e.g. zero result, overflow, sign, carry out Shift instructions: Done outside the ALU by barrel shifter, which can shift

from 1 to 31 bits in no more time than it takes to add two 32 bit numbers using carry lookahead adders

Important points about hardware all of the gates are always working the speed of a gate is affected by the number of inputs to the gate the speed of a circuit is affected by the number of gates in series

(on the “critical path” or the “deepest level of logic”) Our primary focus: comprehension, however,

Clever changes to organization can improve performance(similar to using better algorithms in software)

20

Ripple carry adder is just too slow:

The sequential chain reaction is too slow for time-critical hardware

Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition?

two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it?

Carry Lookahead AdderCarry Lookahead Adder

++ + +

21

Carry lookahead adder (CLA): an approach in-between our two extremes Motivation:

If we didn't know the value of carry-in, what could we do? When would we always generate a carry? gi = ai bi

When would we propagate the carry? pi = ai + bi

Did we get rid of the ripple?

ci+1 = gi + pici

c1 = g0 + p0c0

c2 = g1 + p1c1 = g1 + p1g0 + p1p0c0

c3 = g2 + p2c2 = g2 + p2g1 + p2p1g0 + p2p1p0c0

c4 = g3 + p3c3 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0

Carry lookahead!

Carry Lookahead AdderCarry Lookahead Adder

22

Can’t build a 16 bit adder using the gi & pi CLA method --> too big

Could use ripple carry of 4-bit CLA adders

Better: use the CLA principle again! (see left figure)

Building Bigger Building Bigger AddersAdders CarryIn

Result0--3

ALU0

CarryIn

Result4--7

ALU1

CarryIn

Result8--11

ALU2

CarryIn

CarryOut

Result12--15

ALU3

CarryIn

C1

C2

C3

C4

P0G0

P1G1

P2G2

P3G3

pigi

pi + 1gi + 1

ci + 1

ci + 2

ci + 3

ci + 4

pi + 2gi + 2

pi + 3gi + 3

a0b0a1b1a2b2a3b3

a4b4a5b5a6b6a7b7

a8b8a9b9

a10b10a11b11

a12b12a13b13a14b14a15b15

Carry-lookahead unit

23

SummarySummary

Review number system Additional MIPS instructions The design of an ALU Carry lookahead adder

computer architecture

Documents