computer architecture
DESCRIPTION
COMPUTER ARCHITECTURE. Computer Arithmetic. (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface , 3 rd Ed., Morgan Kaufmann, 2007 ). COURSE CONTENTS. Introduction Instructions Computer Arithmetic Performance - PowerPoint PPT PresentationTRANSCRIPT
1
(Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3rd Ed., Morgan Kaufmann, 2007)
Computer ArithmeticComputer Arithmetic
2
COURSE CONTENTSCOURSE CONTENTS Introduction Instructions Computer ArithmeticComputer Arithmetic Performance Processor: Datapath Processor: Control Pipelining Techniques Memory Input/Output Devices
3
COMPUTER COMPUTER ARITHMETICARITHMETIC
Arithmetic Logic Unit (ALU) Fast Adder
4
Foundation Foundation KnowledgeKnowledge
Decimal, Binary, Octal, & Hexadecimal Numbers Signed & Unsigned Numbers 2’s Complement Representation 2’s Complement Negation, Addition, & Subtraction Overflow Sign Extension ASCII vs Binary Boolean Algebra Logic Design Assembly Language
5
NumbersNumbers
Bits are just bits (no inherent meaning) Conventions define relationship between bits and
numbers Binary numbers (base 2)
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...
decimal: 0 . . . 2n – 1 Of course it gets more complicated:
Numbers are finite (overflow) Fractions and real numbers Negative numbers E.g., no MIPS subi instruction; addi can add a negative
number How do we represent negative numbers?
I.e., which bit patterns will represent which numbers?
6
Possible Possible RepresentationsRepresentations
Three representations Sign Magnitude: One's Complement Two's Complement
000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0 100 = -3 100 = -4101 = -1 101 = -2 101 = -3110 = -2 110 = -1 110 = -2111 = -3 111 = -0 111 = -1
Issues: balance, number of zeros, ease of operations
Which one is best? Why?
7
MIPSMIPS
32 bit signed numbers:
0000 0000 0000 0000 0000 0000 0000 0000two = 0ten
0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten
0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten
...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten
0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten
1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten
1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten
1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten
...1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten
1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten
1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten
• maxint: + 2,147,483,647ten
• minint: – 2,147,483,648ten
8
Two’s Complement Two’s Complement OperationsOperations
Negating a two’s complement number: invert all bits and add 1
Remember: “negate” and “invert” are quite different! Converting n bit numbers into numbers with more
than n bits: MIPS 16 bit immediate gets converted to 32 bits for
arithmetic Copy the most significant bit (the sign bit) into the other
bits0010 -> 0000 00101010 -> 1111 1010
“sign extension” (lbu vs. lb)
9
Additional MIPS Additional MIPS InstructionsInstructions
Character transfer lbu $s1, 100($s2) # $s1 memory [$s2+100] (load byte unsigned) sb $s1, 100($s2) # memory [$s2+100] $s1 (store byte)
Conditions sltu $s2, $s3, $s4 # if ($s3) < ($s4) then $s2 1; # else $s2 0 (set on less than, unsigned numbers)
# Note that slt works on 2’ complement numbers Arithmetic on unsigned numbers
addu $s1, $s2, $s3 # $s1 $s2 + $s3 (no overflow detection)subu $s1, $s2, $s3 # $s1 $s2 - $s3 (no overflow detection)
addiu $s1, $s2, 100 # $s1 $s2 + 100 (no overflow detection) MIPS detects overflow with an exception (interrupt), which is an unscheduled
procedure call. MIPS includes a register, called exception program counter (EPC) to contain the address of the instruction that caused the exception
mfc0 $s1, $epc # $s1 $epc (move from special registers)
10
Shift operationssll $t2, $s0, 8 # $t2 $s0<<8 (shift left by constant) srl $s1, $s2, 10 # $s1 $s2>>10 (shift right by constant) Fill the emptied bits with 0’s
Logical operations and $s1, $s2, $s3 # $s1 $s2 and $s3 (bit-by-bit and) or $s1, $s2, $s3 # $s1 $s2 or $s3 (bit-by-bit or) andi $s1, $s2, 100 # $s1 $s2 and 100 ori $s1, $s2, 100 # $s1 $s2 or 100
Op=0 rs=0 rt=16 rd=10 shamt=8 funct=0sll $t2, $s0, 8
Additional MIPS Additional MIPS InstructionsInstructions
11
ALU: Arithmetic Logic ALU: Arithmetic Logic UnitUnit Performs arithmetic (e.g. add)
& logical operations (e.g. and) in CPU
Control Funct Result
000 and A and B
001 or A or B
010 add A + B
110 sub A - B
111 slt 1 if A<B
A
B
ALU operation
Zero
Result
Overflow
32
Carryout
32
32
12
ALU Building BlocksALU Building Blocks
1-bit adder
Gates, multiplexor
cout = a b + a cin + b cin
sum = a b cin
Note: Cin is carryin, cout is carryoutSum
CarryIn
CarryOut
a
b
13
A Simple ALUA Simple ALU
b
0
2
Result
Operation
a
1
CarryIn
CarryOut
Result31a31
b31
Result0
CarryIn
a0
b0
Result1a1
b1
Result2a2
b2
Operation
ALU 0
Carry In
C arryO u t
ALU 1
Carry In
C arryO u t
ALU 2
Carry In
C arryO u t
ALU31
Carry In
A 1-bit ALU that performs AND, OR, and addition (shown below)Building a 32-bit ALU (shown right)
14
Two's complement approach: just negate b and add. How do we negate?
By selecting Binvert = 1, and setting CarryIn =1 in the least significant bit of ALU, we get 2’s complement subtraction a - b
ALU: SubtractionALU: Subtraction
0
2
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b
)1()( bababa
15
To support set-on-less-than instruction (slt)
slt is an arithmetic instruction
produces a 1 if rs < rt and 0 otherwise
use subtraction: (a-b) < 0 implies a < b
use a Set & a Less signal to indicate result
To support test for equality (beq)
use subtraction: (a-b) = 0 implies a = b
ALU: Additional ALU: Additional OperationsOperations
16
ALU: Additional ALU: Additional OperationsOperations
0
3
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b 2
Less
A 1-bit ALU that performs AND, OR, add,
subtract:Less is used for slt instruction (see 32-bit
ALU next slide)
The ALU for the most significant bit:Set is used for slt instruction, it is connected to Less
of lsb (see 32-bit ALU next slide)
Overflow detection needed on msb
0
3
Result
Operation
a
1
CarryIn
0
1
Binvert
b 2
Less
Set(sign)
Overflowdetection Overflow
17
A 32-bit ALU that performs AND, OR,
add, & subtractFor subtract, set Binvert = 1 and CarryIn
=1 (for add or logical operations, both set
to 0)
Can combine Binvert & CarryIn to
Bnegate
Set and Less, together with subtraction,
can be used for slt
Set(sign)a31
0
ALU0 Result0
CarryIn
a0
Result1a1
0
Result2a2
0
Operation
b31
b0
b1
b2
Result31
Overflow
Binvert
CarryIn
Less
CarryIn
CarryOut
ALU1Less
CarryIn
CarryOut
ALU2Less
CarryIn
CarryOut
ALU31Less
CarryIn
CarryOut
A 32-bit ALUA 32-bit ALU
18
A Final 32-bit ALUA Final 32-bit ALU
Add a zero detector to test for zero results or equality (e.g. in beq instruction)
Control lines (Operation) (3-bit):
000 = and001 = or010 = add110 = subtract111 = slt
bit1 & bit0 to multiplexors in ALU
bit2 to Bnegate
•Note: zero is a 1 when the result is zero!Set
a31
0
Result0a0
Result1a1
0
Result2a2
0
Operation
b31
b0
b1
b2
Result31
Overflow
Bnegate
Zero
ALU0Less
CarryIn
CarryOut
ALU1Less
CarryIn
CarryOut
ALU2Less
CarryIn
CarryOut
ALU31Less
CarryIn
3 bits
(Sign)
CarryOut
1 bit
2 bits
19
ALU Design: SummaryALU Design: Summary
Select building blocks: adders, gates Use multiplexors to select the output we want Perform subtraction using two’s complement Replicate a 1-bit ALU to produce a 32-bit ALU --> regularity Need circuit to detect conditions e.g. zero result, overflow, sign, carry out Shift instructions: Done outside the ALU by barrel shifter, which can shift
from 1 to 31 bits in no more time than it takes to add two 32 bit numbers using carry lookahead adders
Important points about hardware all of the gates are always working the speed of a gate is affected by the number of inputs to the gate the speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”) Our primary focus: comprehension, however,
Clever changes to organization can improve performance(similar to using better algorithms in software)
20
Ripple carry adder is just too slow:
The sequential chain reaction is too slow for time-critical hardware
Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition?
two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it?
Carry Lookahead AdderCarry Lookahead Adder
++ + +
21
Carry lookahead adder (CLA): an approach in-between our two extremes Motivation:
If we didn't know the value of carry-in, what could we do? When would we always generate a carry? gi = ai bi
When would we propagate the carry? pi = ai + bi
Did we get rid of the ripple?
ci+1 = gi + pici
c1 = g0 + p0c0
c2 = g1 + p1c1 = g1 + p1g0 + p1p0c0
c3 = g2 + p2c2 = g2 + p2g1 + p2p1g0 + p2p1p0c0
c4 = g3 + p3c3 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0
Carry lookahead!
Carry Lookahead AdderCarry Lookahead Adder
22
Can’t build a 16 bit adder using the gi & pi CLA method --> too big
Could use ripple carry of 4-bit CLA adders
Better: use the CLA principle again! (see left figure)
Building Bigger Building Bigger AddersAdders CarryIn
Result0--3
ALU0
CarryIn
Result4--7
ALU1
CarryIn
Result8--11
ALU2
CarryIn
CarryOut
Result12--15
ALU3
CarryIn
C1
C2
C3
C4
P0G0
P1G1
P2G2
P3G3
pigi
pi + 1gi + 1
ci + 1
ci + 2
ci + 3
ci + 4
pi + 2gi + 2
pi + 3gi + 3
a0b0a1b1a2b2a3b3
a4b4a5b5a6b6a7b7
a8b8a9b9
a10b10a11b11
a12b12a13b13a14b14a15b15
Carry-lookahead unit
23
SummarySummary
Review number system Additional MIPS instructions The design of an ALU Carry lookahead adder