cse 141 – computer architecture summer session i, 2005 ... · – chapter 3. arithmetic for...

59
CSE 141 – Computer Architecture Summer Session I, 2005 Lecture 2 ALU Pramod V. Argade

Upload: others

Post on 12-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

CSE 141 – Computer ArchitectureSummer Session I, 2005

Lecture 2ALU

Pramod V. Argade

Page 2: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-2Pramod Argade UCSD CSE 141, Summer Session 1, 2005

CSE141: Introduction to Computer Architecture

Instructor: Pramod V. Argade ([email protected])Office Hour: Wed. 4:30 - 5:30 PM (EBU3 2204)

TAs:Chengmo Yang: [email protected] Rao: [email protected]

Lecture: Mon/Wed. 6:00 - 8:50 PM, HSS 1330

Textbook: Computer Organization & DesignThe Hardware Software Interface, 3nd Edition.Authors: Patterson and Hennessy

Web-page: http://www.cse.ucsd.edu/classes/su05/cse141

Page 3: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-3Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Summer Session I, 2005CSE141 Course Schedule

Lecture # Date Time Room Topic Quiz topic HomeworkDue

1 Mon. 6/27 6 - 8:50 PM HSS 1330 Introduction, Ch. 1ISA, Ch. 2 - -

2 Wed. 6/29 6 - 8:50 PM HSS 1330 Arithmetic, Ch. 3 ISACh. 2 #1

- Mon. 7/4 July 4th Holiday - -

3 Wed. 7/6 6 - 8:50 PM HSS 1330 Performance, Ch. 4Single-cycle CPU Ch. 5

ArithmeticCh. 3 #2

4 Mon. 7/11 6 - 8:50 PM HSS 1330 Single-cycle CPU Ch. 5 Cont.Multi-cycle CPU Ch. 5

PerformanceCh. 4 #3

5 Tue. 7/12 7:30 - 8:50 PM HSS 1330 Multi-cycle CPU Ch. 5 Cont.(July 4th make up class) - -

6 Wed. 7/13 6 - 8:50 PM HSS 1330 Single and Multicycle CPU Examples andReview for Midterm

Single-cycle CPUCh. 5

-

7 Mon. 7/18 6 - 8:50 PM HSS 1330 Mid-term ExamExceptions - #4

8 Tue. 7/19 7:30 - 8:50 PM HSS 1330 Pipelining Ch. 6(July 4th make up class) - -

9 Wed. 7/20 6 - 8:50 PM HSS 1330 Hazards, Ch. 6 - -

10 Mon. 7/25 6 - 8:50 PM HSS 1330 Memory Hierarchy & Caches Ch. 7 HazardsCh. 6 #5

11 Wed. 7/27 6 - 8:50 PM HSS 1330 Virtual Memory, Ch. 7Course Review

CacheCh. 7 #6

12 Sat. 7/30 TBD TBD Final Exam - -

No Class

Page 4: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-4Pramod Argade UCSD CSE 141, Summer Session 1, 2005

AnnouncementsDiscussions Sections for 141: Thursday, 7:30 - 8:30 Room TBD (Check class web page)

Office Hour:– Wed. 4:30 - 5:30 PM (EBU3 2204)

Reading Assignment– Chapter 3. Arithmetic for Computers

Sections 3.1 - 3.6, B5Homework 2: Due Wed., July 6th in class3.2, 3.4, 3.7, 3.9, 3.12, 3.13, 3.19, 3.28, 3.30, 3.36, 3.384.23 (2nd Edition)Multiply -12x-13 using Booth’s algorithm and 5-bit 2’s complement representation

of multiplicand and multiplier.Quiz

When: Wed., July 6th, First 10 minutes of the classTopic: Arithmetic, Chapter 3 Need: Paper, pen

Page 5: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

Computer Arithmetic and ALU Design

Page 6: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-6Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Arithmetic -- The heartof instruction execution

32

32

32

operation

result

a

b

ALU

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

Page 7: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-7Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Bits everywhere… What do they mean?

bits (011011011100010 ....01)

instruction

R-format I-format ...

data

number text chars ..............

integer floating point

signed unsigned single precision double precision

... .........

Page 8: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-8Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Questions About Numbers and ALUHow do you represent– signed and unsigned numbers– negative numbers?– fractions?– really large numbers?– really small numbers?

How do you– do arithmetic?– identify errors (e.g. overflow)?

What is an ALU and what does it look like?– ALU = Arithmetic & Logic Unit

Page 9: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-9Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Introduction to Binary Numbers

Consider a 4-bit binary number

Examples of binary arithmetic:3 + 2 = 5 3 + 3 = 6

BinaryBinaryDecimal0 00001 00012 00103 0011

Decimal4 01005 01016 01107 0111

0 0 1 1

0 0 1 0+

0 1 0 1

1

0 0 1 1

0 0 1 1+

0 1 1 0

1 1

Page 10: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-10Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Negative Numbers?

We would like a number system that provides– obvious representation of positive and negative integers– uses the same adder for addition and subtraction– single value of 0– equal coverage of positive and negative numbers– easy detection of sign– easy negation

Page 11: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-11Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Sign Magnitude: One's Complement Two's Complement000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0 100 = -3 100 = -4101 = -1 101 = -2 101 = -3110 = -2 110 = -1 110 = -2111 = -3 111 = -0 111 = -1

2’s complement representation of negative numbers– Take the bitwise inverse and add 1

Issues: balance, number of zeros, ease for HW/SWWhich one is best? Why?

Possible Representations

Page 12: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-12Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Two’s Complement Arithmetic

Examples: 7 - 6 = 7 + (- 6) = 1 3 - 5 = 3 + (- 5) = -2

2’s Complement Binary2’s Complement BinaryDecimal0 00001 00012 00103 0011

111111101101

Decimal-1-2-3

4 01005 01016 01107 0111

1100101110101001

-4-5-6-7

1000-8

0 1 1 1

1 0 1 0+

0 0 0 1

1

0 0 1 1

1 0 1 1+

1 1 1 0

1 111

Page 13: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-13Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Immediate Field in I-FormatI-Format Instruction Example:

addi $s3, $t0, 4

For LW, SW, BEQ, BNE, ADDI– 16-bit Immediate field is signed

Copy sign bit to all upper 16 bits– It is sign extended to 32 bits before use

Sign extension:0010 in 4 bits is the same as 0000 0010 in 8 bits (+2)1110 in 4 bits is the same as 1111 1110 in 8 bits (-2)

– Why is there no SUBI instruction for MIPS?For ANDI, ORI– 16-bit Immediate field is zero-extended to 32 bits before use

Copy zero to all upper 16 bits– Why?

op rs rt 16 bit addressI6 bits 5 bits 5 bits

Page 14: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-14Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Foundation of ICs: Field Effect Transistor (FET)

Source

Drain

Gate

• N-FET: Conducts when Gate is high

Gate

Source

Drain

• P-FET: Conducts when Gate is low

• CMOS Inverter• Input high, output low• Input low, output high• Power consumed only during transition

VDD

In Out

• Current processing technology 0.09 µ, next 0.065 µ

Page 15: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-15Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Some basics of digital logic

c = a . bba000010001111

b

ac

b

ac

a c

c = a + bba000110101111

1001

c = aa

a0b1

cd

0

1

a

c

b

d

1. AND gate (c = a . b)

2. OR gate (c = a + b)

3. Inverter (c = a)�

4. Multiplexor (if d = = 0, c = a; else c = b)

Page 16: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-16Pramod Argade UCSD CSE 141, Summer Session 1, 2005

1-bit ALU

b

0

1

Result

Operation

a• ALU Control Lines (ALUop) Function

– 000 And– 001 Or

b

0

2

Result

Operation

a

1

CarryIn

CarryOut

• ALU Control Lines (ALUop) Function– 000 And– 001 Or– 010 Add

But how do we make the adder?

Page 17: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

A One-bit Full Adder

This is also called a (3, 2) adderTruth Table:

1-bitFull

Adder

CarryOut

CarryIn

A

BC

Inputs Outputs

CommentsA B CarryIn SumCarryOut

0 0 0 0 0 0 + 0 + 0 = 00

0 0 1 0 1 0 + 0 + 1 = 01

0 1 0 0 1 0 + 1 + 0 = 01

0 1 1 1 0 0 + 1 + 1 = 10

1 0 0 0 1 1 + 0 + 0 = 01

1 0 1 1 0 1 + 0 + 1 = 10

1 1 0 1 0 1 + 1 + 0 = 10

1 1 1 1 1 1 + 1 + 1 = 11

Page 18: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-18Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Logic Equation for CarryOut

CarryOut = (!A & B & CarryIn) | (A & !B & CarryIn) | (A & B & !CarryIn)| (A & B & CarryIn)

= (B & CarryIn) | (A & CarryIn) | (A & B)

Inputs Outputs

CommentsA B CarryIn SumCarryOut

0 0 0 0 0 0 + 0 + 0 = 00

0 0 1 0 1 0 + 0 + 1 = 01

0 1 0 0 1 0 + 1 + 0 = 01

0 1 1 1 0 0 + 1 + 1 = 10

1 0 0 0 1 1 + 0 + 0 = 01

1 0 1 1 0 1 + 0 + 1 = 10

1 1 0 1 0 1 + 1 + 0 = 10

1 1 1 1 1 1 + 1 + 1 = 11

Page 19: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-19Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Logic Equation for Sum

Sum = (!A & !B & CarryIn) | (!A & B & !CarryIn) | (A & !B & !CarryIn) | (A & B &CarryIn)

Inputs Outputs

CommentsA B CarryIn SumCarryOut

0 0 0 0 0 0 + 0 + 0 = 00

0 0 1 0 1 0 + 0 + 1 = 01

0 1 0 0 1 0 + 1 + 0 = 01

0 1 1 1 0 0 + 1 + 1 = 10

1 0 0 0 1 1 + 0 + 0 = 01

1 0 1 1 0 1 + 0 + 1 = 10

1 1 0 1 0 1 + 1 + 0 = 10

1 1 1 1 1 1 + 1 + 1 = 11

Page 20: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-20Pramod Argade UCSD CSE 141, Summer Session 1, 2005

32-bit ALU

b

0

2

Result

Operation

a

1

CarryIn

CarryOut

Result31a31

b31

Result0

CarryIn

a0

b0

Result1a1

b1

Result2a2

b2

Operation

ALU0

CarryIn

CarryOut

ALU1

CarryIn

CarryOut

ALU2

CarryIn

CarryOut

ALU31

CarryIn

The 32-bit ALU

• Implements functions:– AND– OR– ADD

1-bit ALU

• What about SUB and SLT?

Page 21: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-21Pramod Argade UCSD CSE 141, Summer Session 1, 2005

32-bit ALU: Subtraction

• Keep in mind the following:– (A - B) is the same as: A + (-B)

• Bit-wise inverse of B is !B:– A - B = A + (-B) = A + (!B + 1) = A + !B + 1

0

2

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b

• Binvert provides the negation• How about “+1”?• For SUB

• set Binvert = 1• CarryIn = 1 for LSB

Page 22: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-22Pramod Argade UCSD CSE 141, Summer Session 1, 2005

No overflow when adding a positive and a negative numberNo overflow when signs are the same for subtractionOverflow occurs when the value affects the sign:– overflow when adding two positives yields a negative – or, adding two negatives gives a positive– or, subtract a negative from a positive and get a negative– or, subtract a positive from a negative and get a positive

Consider the operations A + B, and A – B– Can overflow occur if B is 0 ? – Can overflow occur if A is 0 ?

Response of MIPS to overflow will be covered later in the course

Detecting Overflow

Page 23: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-23Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Overflow Detection

0 1 1 1

0 0 1 1+

1 0 1 0

1

1 1 0 0

1 0 1 1+

0 1 1 1

110

73

1

-6

- 4- 5

7

0

0 0 1 0

0 0 1 1+

0 1 0 1

1

1 1 0 0

1 1 1 0+

1 0 1 0

100

2

3

0

5

- 4

- 2

- 6

1 0 0

1 0

So how do we detect overflow?

Page 24: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-24Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Overflow Detection LogicCarry into MSB ! = Carry out of MSB– For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1]

A0

B01-bitALU

Result0

CarryIn0

CarryOut0A1

B11-bitALU

Result1

CarryIn1

CarryOut1A2

B21-bitALU

Result2

CarryIn2

A3

B31-bitALU

Result3

CarryIn3

CarryOut3

Overflow

X Y X XOR Y

0 0 00 1 11 0 11 1 0

Page 25: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-25Pramod Argade UCSD CSE 141, Summer Session 1, 2005

BEQ/BNE: Zero Detection Logic

Zero Detection Logic is just one BIG NOR gate– Any non-zero input to the NOR gate will cause its output to be zero

CarryIn0

A0

B01-bitALU

Result0

CarryOut0A1

B11-bitALU

Result1CarryIn1

CarryOut1A2

B21-bitALU

Result2CarryIn2

CarryOut2A3

B31-bitALU

Result3CarryIn3

CarryOut3

Zero

Page 26: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-26Pramod Argade UCSD CSE 141, Summer Session 1, 2005

SLT: Set-on-less-than Logic

SLT $1, $2, $3– if( $2 < $3)

$1 = 1;else $1 = 0;

To test A < B, do a subtraction (A - B)– (A < B) if (A - B) < 0, i.e. negative

Use sign bit– Route the sign bit to bit 0 of result– Set bits 1 - 31 to zero

There is a complication due to overflow– Work out solution in Homework problem 4.23 (2nd Edition)

Page 27: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-27Pramod Argade UCSD CSE 141, Summer Session 1, 2005

A Complete 32-bit ALU

Seta31

0

Result0a0

Result1a1

0

Result2a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Bnegate

Zero

ALU0Less

CarryIn

CarryOut

ALU1Less

CarryIn

CarryOut

ALU2Less

CarryIn

CarryOut

ALU31Less

CarryIn

Functionality• Arithmetic Operations:

• ADD, SUB• Logical Operations:

• AND, OR• Compare

• SLT• Support for branch

• BEQ, BNE• Exception detection

• Overflow

Note: “Less” is connected to “Set” input for bit 0.For all other bits, less is connected to zero.

Page 28: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-28Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Designing an Arithmetic Logic Unit

ALU Control Lines (ALUop) Function– 000 And– 001 Or– 010 Add– 110 Subtract– 111 Set-on-less-than

AL

U

N

N

N

A

B

Result

Overflow

Zero

3ALUop

CarryOut

Page 29: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-29Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Conclusion

We can build an ALU to support the MIPS instruction set– Key idea: use multiplexor to select the output we want– Efficiently perform subtraction using two’s complement– Replicate a 1-bit ALU to produce a 32-bit ALU

Important points about hardware– All of the gates are always working– The speed of a gate is affected by the number of inputs to the gate– The speed of a circuit is affected by the number of gates in series

(on the “critical path” or the “deepest level of logic”)For computer hardware, “Speed is it!”

Page 30: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-30Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Our ALU has functionality but lacks speed...

• Cycle Time = CLK-to-Q + Longest Gate Delay + Setup Time + Clock Skew

Clk1

Clk2 Clock Skew

.

.

.

.

.

.

.

.

.

.

.

.

CLK1 CLK2

CPU time = Seconds = Instructions x Cycles x SecondsProgram Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x SecondsProgram Program Instruction Cycle

Page 31: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-31Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Adder in our ALU is in timing critical path

Seta31

0

Result0a0

Result1a1

0

Result2a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Bnegate

Zero

ALU0Less

CarryIn

CarryOut

ALU1Less

CarryIn

CarryOut

ALU2Less

CarryIn

CarryOut

ALU31Less

CarryIn

• The adder we just built is called a “Ripple Carry Adder”– The carry bit may have to propagate from LSB to MSB– Worst case delay for an N-bit RC adder: 2N-gate delay

CarryIn

CarryOut

A

B

• Single gate delay = 0.02 ns (inverter “speed” of 50 GHz)• 32 bit adder => 64 gate delay => 1.28 ns delay • Accounting for CLK2Q, set up time and clock skew, the ALU will run at << 789 MHz

Page 32: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-32Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Is there more than one way to do addition?– two extremes: ripple carry and sum-of-products

Can you see the ripple? How could you get rid of it?

c1 = b0c0 + a0c0 + a0b0c2 = b1c1 + a1c1 + a1b1c3 = b2c2 + a2c2 + a2b2 c4 = b3c3 + a3c3 + a3b3

Problem: ripple carry adder is slow

Page 33: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-33Pramod Argade UCSD CSE 141, Summer Session 1, 2005

The Theory Behind Carry Look-ahead

Recall: CarryOut = (B & CarryIn) | (A & CarryIn) | (A & B)– Cin1 = Cout0 = (B0 & Cin0) | (A0 & Cin0) | (A0 & B0) – Cin2 = Cout1 = (B1 & Cin1) | (A1 & Cin1) | (A1 & B1)

Substituting Cin1 into Cin2:– Cin2 = (A1 & A0 & B0) | (A1 & A0 & Cin0) | (A1 & B0 & Cin0) |

(B1 & A0 & B0) | (B1 & A0 & Cin0) | (B1 & B0 & Cin0) | (A1 & B1)Now define two new terms:– Generate Carry at Bit i gi = Ai & Bi– Propagate Carry via Bit i pi = Ai | Bi– Cin1 = Cin0( A0 | B0) | (A0 & B0) = (Cin0 & p0) | g0– Cin2 = Cin1 (A1 | B1) | (A1 & B1) = (Cin1 & p1) | g1

Cin0

A0B0

1-bitALUC

out0

A1B1

1-bitALU

Cin1

Cout1

Cin2

Page 34: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-34Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Carry Lookahead: 1st Level AbstractionUsing the two new terms we just defined:– Generate Carry at Bit i gi = Ai & Bi– Propagate Carry via Bit i pi = Ai | Bi

We can rewrite:– Cin1 = g0 | (p0 & Cin0)– Cin2 = g1 | (p1 & g0) | (p1 & p0 & Cin0)– Cin3 = g2 | (p2 & g1) | (p2 & p1 & g0) | (p2 & p1 & p0 & Cin0)

Carry going into bit 3 is 1 if– We generate a carry at bit 2 (g2)– Or we generate a carry at bit 1 (g1) and

bit 2 allows it to propagate (p2 & g1)– Or we generate a carry at bit 0 (g0) and

bit 1 as well as bit 2 allows it to propagate (p2 & p1 & g0) – Or we have a carry input at bit 0 (Cin0) and

bit 0, 1, and 2 all allow it to propagate (p2 & p1 & p0 & Cin0)

Page 35: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-35Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Carry Lookahead: 2nd Level of AbstractionPropagate signals for 4-bit adders (1 gate delay to combine p’s)

– P0 = p3.p2.p1.p0– P1 = p7.p6.p5.p4– P2 = p11.p10.p9.p8– P3 = p15.p14.p13.p12

Generate signals for 4-bit adders (2 gate delays to combine p’s and g’s)– G0 = g3 + (p3.g2) + (p3.p2.g1) + (p3.p2.p1.g0)– G1 = g7 + (p7.g6) + (p7.p6.g5) + (p7.p6.p5.g4)– G2 = g11 + (p11.g10) + (p11.p10.g9) + (p11.p10.p9.g8)– G3 = g15 + (p15.g14) + (p15.p14.g13) + (p15.p14.p13.g12)

4-bit adder carry computation (2 gate delays to combine G’s P’s and c0)– C1 = G0 + (P0.c0)– C2 = G1 + (P1.G0) + (P1.P0.c0)– C3 = G2 + (P2.G1) + (P2.P1.G0) + (P2.P1.P0.c0)– C4 = G3 + (P3.G2) + (P3.P2.G1) + (P3.P2.P1.G0) + (P3.P2.P1.P0.c0)

Total 2 + 2 + 1 = 5 levels of logic to compute c16– C4 has 2 levels of logic with Gi, Pi– Gi has 2 levels of logic with gi, pi– gi & pi have 1 level of logic with inputs Ai, Bi

Page 36: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-36Pramod Argade UCSD CSE 141, Summer Session 1, 2005

16-bit Ripple Carry (RC) adder has 16 x 2 = 32 gate delays16-bit Carry-Look-Ahead adder (CLA) has5 gate delaysCLA adder is faster than RC by a factor of32/5 ~ = 6

16-bit Adder from Four 4-bit ALUsCarryIn

Result0--3

ALU0

CarryIn

Result4--7

ALU1

CarryIn

Result8--11

ALU2

CarryIn

CarryOut

Result12--15

ALU3

CarryIn

C1

C2

C3

C4

P0G0

P1G1

P2G2

P3G3

pigi

pi + 1gi + 1

ci + 1

ci + 2

ci + 3

ci + 4

pi + 2gi + 2

pi + 3gi + 3

a0 b0 a1 b1 a2 b2 a3 b3

a4 b4 a5 b5 a6 b6 a7 b7

a8 b8 a9 b9

a10 b10 a11 b11

a12 b12 a13 b13 a14 b14 a15 b15

Carry-lookahead unit

Page 37: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-37Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Grade school Multiplication algorithm• In general (ignoring sign bits):

• m bits x n bits = (m+n) bit product

• Binary makes it easy:• 0 => place 0 ( 0 x multiplicand)

• 1 => place multiplicand ( 1 x multiplicand)

• Paper and pencil example of binary multiplication: (8*10 = 80, 0x8 * 0xa = 0x50 )

1000 (multiplicand)x 1010 (multiplier)00001000x0000xx

1000xxx1010000 (Result)

Page 38: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-38Pramod Argade UCSD CSE 141, Summer Session 1, 2005

More complicated than additionSimple algorithm:– Accomplished via shift and add

More time delay and more gates (=> silicon area)Let's look at 3 versions based on grade school algorithm

Observations about Multiplication

Page 39: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-39Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Multiplication: First Version

Done

1. Test Multiplier0

1a. Add multiplicand to product and place the result in Product register

2. Shift the Multiplicand register left 1 bit

3. Shift the Multiplier register right 1 bit

32nd repetition?

Start

Multiplier0 = 0Multiplier0 = 1

No: < 32 repetitions

Yes: 32 repetitions

64-bit ALU

Control test

MultiplierShift right

ProductWrite

MultiplicandShift left

64 bits

64 bits

32 bits

Initialization:• Load 32-bit multiplicand and zero extend to 64 bits• Load 64-bit product register with zeroNeed a state machine to control operation 32 Iterations are required• Each Iteration takes 3 clocks• Total 96 + 3 = 99 clocks

Initiailzed to 0

Observations?

Page 40: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-40Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Multiplication: Second Version

MultiplierShift right

Write

32 bits

64 bits

32 bits

Shift right

Multiplicand

32-bit ALU

Product Control test

Done

1. Test Multiplier0

1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register

2. Shift the Product register right 1 bit

3. Shift the Multiplier register right 1 bit

32nd repetition?

Start

Multiplier0 = 0Multiplier0 = 1

No: < 32 repetitions

Yes: 32 repetitions

Initialization:• Load 32-bit multiplicand to 32-bit register • Load 32-bit multipler to 32-bit register• Load 64-bit product register with zeroNeed a state machine to control operation

Observations?

Page 41: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-41Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Multiplication: Third Version

Control testWrite

32 bits

64 bits

Shift rightProduct

Multiplicand

32-bit ALU

Done

1. Test Product0

1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register

2. Shift the Product register right 1 bit

32nd repetition?

Start

Product0 = 0Product0 = 1

No: < 32 repetitions

Yes: 32 repetitions

Initialization:• Load 32-bit multiplicand to 32-bit register• Load upper 32 bits of product register with zero• Load lower 32 bits of product register with multiplierNeed a state machine to control operation

Observations?

Page 42: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-42Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Multiplying Signed Numbers

Convert all operands to positiveDetermine sign of the product– Sign of the product = sign( op1) ^ sign( op2)

Multiply positive operands (only 31 bits)If the sign of the result is negative, negate the resultPrevious approach will work:– Must extend sign of the product when shifting

Is there a better way?

Page 43: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-43Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Booth’s AlgorithmAn elegant approach to multiplying signed numbersWith ability to add, subtract and shift– There are multiple ways to do multiply

Consider signed operands A and BA = (A31*-231) + (A30*230) + (A29*229) + … +(A1*21) + (A0*20)

= (-A31*231) + (2A30 -A30 )230 + (2A29 -A29 )229 + … + (2A0- A0)20

= (A30 - A31)231 + (A29 - A30)230 + … + (A1 - A2)21 + (A-1 - A0)20

A*B = [(A30 - A31)231 + (A29 - A30)230 + … + (A1 - A2)21 + (A-1 - A0)20]*B= (A30 - A31)231*B + (A29 - A30)230*B + … + (A1 - A2)21*B + (A-1 - A0)20 *B

Recipe:Evaluate (Ai-1 - Ai)

0: Do nothing1: Add B

-1: Subtract B

Page 44: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-44Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Booths algorithm: Signed multiplication

Current Bit Bit to the Right Explanation Example Op1 0 Begins run of 1s 0001111000 sub1 1 Middle of run of 1s 0001111000 none0 1 End of run of 1s 0001111000 add0 0 Middle of run of 0s 0001111000 none

Originally for Speed (when shift was faster than add)• Replace a string of 1s in multiplier with an initial subtract when we first see a one and

then later add for the bit after the last one• Potential speed up recognizing that string of 0’s and 1’s requires no operation!

0 1 1 1 1 0beginning of runend of run

middle of run

A*B = (A30 - A31)231*B + (A29 - A30)230*B + … + (A1 - A2)21*B + (A-1 - A0)20 *B

Page 45: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-45Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Booth’s Algorithm

• Example: Use Booth’s Algorithm for following multiplication2 * (-6) = 0010 * 1010 = -12 = 1111 0100

• Recipe: for A*BAdd Ai-1 = 0Evaluate (Ai-1 - Ai)

0: Do nothing1: Add B-1: Subtract B

Page 46: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-46Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Division

1001 QuotientDivisor 1000 1001010 Dividend

–1000101011010

–100010 Remainder (or Modulo result)

See how big a number can be subtracted, creating quotient bit on each stepBinary => 1 * divisor or 0 * divisor

Dividend = Quotient x Divisor + Remainder=> sizeof( Dividend ) = sizeof( Quotient ) + sizeof( Divisor )

3 versions of divide, successive refinement

Page 47: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-47Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Division 1.0

• Initialization:• 32-bit quotient register = 0• 64-bit remainder = dividend• 64-bit Divisor = (32-bit divisor << 32)

Remainder

Quotient

Divisor

64-bit ALU

Shift Right

Shift Left

WriteControl

32 bits

64 bits

64 bits

Page 48: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-48Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Division 1.0

1. Subtract the Divisor register from the Remainder register, and place the result in the

Remainder register.

Test RemainderRemainder < 0Remainder >= 0

2a. Shift the Quotient register to the left setting the new rightmost bit to 1.

2b. Restore the original value by adding the Divisor register to the Remainder register, and place the sum in the Remainder register. Also

shift the Quotient register to the left, setting the new least significant bit to 0.

3. Shift the Divisor register right 1 bit.

33rd repetition? No: < 33 repetitions

Done

Yes: 33 repetitions

Start

Page 49: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-49Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Divide Algorithm

Optimizations similar to that for multiply algorithm can be done– 32-bit Divisor register– 32-bit ALU– Quotient bits are left shifted into the remainder register

In case the result of subtraction is negative, remainder register has to be restored– Takes one extra clock cycle

Non-restoring divide algorithm removes this stepDivide overflow case– 0x80000000/-1

Page 50: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-50Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Floating Point: Introduction

We need a way to represent real numbers– Numbers with fractions, e.g., 3.14159265… (recognize me?)

– Very small numbers, e.g., 0.0000000000000000000000013621

– Very large numbers, e.g., 9,349,398,989,787,762,244,859,087,678

Binary Fractions:10112 = 1x23 + 0x22 + 1x21 + 1x20

so...101.0112 = 1x22 + 0x21 + 1x20 + 0x2-1 + 1x2-2 + 1x2-3

e.g.,.75 = 0.5 + 0.25 = 1/2 + 1/4 = .112

Page 51: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-51Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Recall Scientific Notation

6.02 x 1023

exponent

radix (base)Mantissa

decimal point

IEEE Single Precision F.P. ± 1.M x 2e - 127

Page 52: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-52Pramod Argade UCSD CSE 141, Summer Session 1, 2005

IEEE 754Single-precision Floating-Point

N = (-1)S (1.M) 2 E-127

• Example:Convert - 325.75 to IEEE Single Precision Floating Point Representation

1 8 23

sign exponent:excess 127binary integer

mantissa:normalized binary significand w/ hidden integer bit: 1.M

S E M Total 32 bits

Page 53: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-53Pramod Argade UCSD CSE 141, Summer Session 1, 2005

IEEE 754 Double-precision Floating-Point

N = (-1)S (1.M) 2 E-1023

• Example:Convert - 325.75 to IEEE Double Precision Floating Point Representation

sign exponent:excess 1023binary integer

mantissa:normalized binary significand w/ hidden integer bit: 1.M

1 11 20S E M M

32

Total 64 bits

Page 54: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-54Pramod Argade UCSD CSE 141, Summer Session 1, 2005

IEEE 754 Single Precision FP

If E=255 and F is nonzero, then V = NaN ("Not a number")If E=255 and F is zero and S is 1, then V = -InfinityIf E=255 and F is zero and S is 0, then V = InfinityIf 0 <E <255 then V = (-1) S * 2 (E-127) * (1.F)If E=0 and F is zero and S is 1, then V = -0If E=0 and F is zero and S is 0, then V = 0

In particular,0 00000000 00000000000000000000000 = 01 00000000 00000000000000000000000 = -00 11111111 00000000000000000000000 = Infinity1 11111111 00000000000000000000000 = -Infinity0 11111111 00000100000000000000000 = NaN1 11111111 00100010001001010101010 = NaN0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2

1 8 23

sign exponent:excess 127binary integer

mantissa:normalized binary significand w/ hidden integer bit: 1.M

S E M Total 32 bits

Page 55: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-55Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Floating Point Addition

Done

2. Add the significands

4. Round the significand to the appropriate number of bits

Still normalized?

Start

Yes

No

No

YesOverflow or underflow?

Exception

3. Normalize the sum, either shifting right and incrementing the exponent or shifting left

and decrementing the exponent

1. Compare the exponents of the two numbers. Shift the smaller number to the right until its exponent would match the larger exponent

Page 56: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-56Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Floating Point Addition

0 10 1 0 1

Control

Small ALU

Big ALU

Sign Exponent Significand Sign Exponent Significand

Exponent difference

Shift right

Shift left or right

Rounding hardware

Sign Exponent Significand

Increment or decrement

0 10 1

Shift smaller number right

Compare exponents

Add

Normalize

Round

Example: 0.5 + ( - 0.4375)L = 1.000 x 2-1 R = -1.110 x 2-2

Le - Re = -1 - ( -2 ) = 1

2-1-0.111

1.000

0.001

-4 1.000

Result = 1.000 x 2-4

Page 57: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-57Pramod Argade UCSD CSE 141, Summer Session 1, 2005

IEEE 754 Floating Point

Increasing the size of significand enhances accuracyIncreasing the size of exponent increases the range of the numbers that can be representedSpecial representation of 0 (E = 00000000)Overflow or underflow can happenCan do integer compare for greater-than, sign: Fast comparison!Single Precision– Range of about 2 x 10-38 to 2 x 1038

Double Precision– Range of about 2 x 10-308 to 2 x 10308

Infinite variety of real numbers exist between, say, 0 and 1– Not more than 253 can be represented exactly in double precision

Page 58: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-58Pramod Argade UCSD CSE 141, Summer Session 1, 2005

Floating Point Complexities

Operations are somewhat more complicated

In addition to overflow we can have “underflow”

Accuracy can be a big problem– IEEE 754 keeps two extra bits, guard and round

– four rounding modes

– positive divided by zero yields “infinity”

– zero divide by zero yields “not a number”

Implementing the standard can be trickyNot using the standard can be even worse– See text for description of 80x86 and Pentium bug!

Page 59: CSE 141 – Computer Architecture Summer Session I, 2005 ... · – Chapter 3. Arithmetic for Computers Sections 3.1 - 3.6, B5 zHomework 2: Due Wed., July 6th in class ... zQuiz When:

2-59Pramod Argade UCSD CSE 141, Summer Session 1, 2005

• Multiplication and division take much longer than addition, requiring multiple addition steps.

• Floating Point extends the range of numbers that can be represented, at the expense of precision (accuracy).

• FP operations are very similar to integer, but with pre- and post-processing.

Summary