compuer organizaion processing unit

Processing Unit

Deepak John ,Department Computer Applications, SJCET-Pala

Fundamental Concepts

Processor fetches one instruction at a time and perform theoperation specified.

Instructions are fetched from successive memory locationsuntil a branch or a jump instruction is encountered.

Processor keeps track of the address of the memorylocation containing the next instruction to be fetched usingProgram Counter (PC).

Instruction Register (IR)

Executing an Instruction

Fetch the contents of the memory location pointed to bythe PC. The contents of this location are loaded into theIR (fetch phase).

IR ← [[PC]] Assuming that the memory is byte addressable,

increment the contents of the PC by 4 (fetch phase).PC ← [PC] + 4

Carry out the actions specified by the instruction in theIR (execution phase).

Processor Organization

Executing an Instruction Transfer a word of data from one processor register to

another or to the ALU. Perform an arithmetic or a logic operation and store the

result in a processor register. Fetch the contents of a given memory location and load

them into a processor register. Store a word of data from a processor register into a given

memory location.

Register Transfers For each register two control signals are used1. To place the contents of that register on the bus2. To load the data on the bus into register The input and output of register are connected to the bus via

switches controlled by the signals Rin and Rout

Performing an Arithmetic or Logic Operation The ALU is a combinational circuit that has no

internal storage. ALU gets the two operands from MUX and bus. The

result is temporarily stored in register Z. What is the sequence of operations to add the

contents of register R1 to those of R2 and store theresult in R3?

1. R1out, Yin2. R2out, SelectY, Add, Zin3. Zout, R3in

Fetching a Word from Memory Address into MAR; issue Read operation; data into MDR.

Memory-Function-Completed (MFC)

Execution of Branch Instructions A branch instruction replaces the contents of PC with the

branch target address, which is usually obtained by addingan offset X given in the branch instruction.

The offset X is usually the difference between the branchtarget address and the address immediately following thebranch instruction.

Conditional branchStep Action

1 PCout

, MARin

, Read,

Select4,Add, Zin

2 Zout

, PCin

, Yin

, WMF

C

3 MDRout

, IRin4 Offset-field-of-IR

out, Add

,Z

in

5 Zout

, PCin

, End

Figure 7.7. Control sequence for an unconditional branch instruction.

Multiple-Bus OrganizationRegister file:all general purpose registers arecombinedBuses A and B:used to transfer the source operandsto the ALU

Step Action

1 PCout, R=B, MAR in, Read, IncPC

2 WMFC

3 MDRoutB, R=B, IR in

4 R4outA, R5outB, SelectA, Add, R6in, End

Figure 7.9. Control sequence for the instruction. Add R4,R5,R6,for the three-bus organization

Hardwired Control

To execute instructions, the processor must have some means ofgenerating the control signals needed in the proper sequence.

Two categories: hardwired control and microprogrammed control Hardwired system can operate at high speed; but with little

flexibility.

Control Unit Organization

CLKClock Control step

IR encoderDecoder/

Control signals

codes

counter

inputs

Condition

External

Detailed Block DescriptionStep Decoder :providesseperate signal line foreach step or time slot inthe ccontrol sequence.Instruction decoder :o/pconsists of a seperateline for each machineinstruction.For any instruction in IR,one of the output linesINS1 through INSm isset to 1,all other lines areset to 0Input signals to theencoder block arecombined to generatethe individual controlsignalsYin,Pcout,ADD,End etc

Generating Zin

Zin = T1 + T6 • ADD + T4 • BR + …

T1

AddBranch

T4 T6

ADVANTAGES Hardwired Control Unit is fast because control signals are

generated by combinational circuits. The delay in generation of control signals depends upon the

number of gates.DISADVANTAGES more complex will be the design of control unit. Modifications in control signal are very difficult. That means it

requires rearranging of wires in the hardware circuit. It is difficult to correct mistake in original design or adding new

feature in existing design of control unit.

A Complete Processor

Microprogrammed Control

Control signals are generated by a program similar tomachine language programs.

Control Word (CW):is a word whose individual bits representthe various control signals; Each step of the instructionexecution is represented by a control word with all of the bitscorresponding to the control signals needed for the step setto one.

Microroutine:a sequence of CW’s corresponding to thecontrol sequence of a machine instruction

microinstruction:individual control words in microroutine,consists of:

One or more micro-operations to be executed.Address of next microinstruction to be executed.

Control store: micro routines forall instructions in the instructionset of a computer are stored

The previous organization cannot handle the situation when thecontrol unit is required to check the status of the conditioncodes or external inputs to choose between alternative coursesof action.

Use conditional branch microinstruction.

Address Microinstruction

0 PCout , MARin , Read,Select4,Add, Zin

1 Zout , PCin , Yin , WMFC

2 MDRout , IRin

3 Branch to startingaddress of appropriate microroutine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25 If N=0, then branchto microinstruction 0

26 Offset-field-of-IRout , SelectY, Add, Zin

27 Zout , PCin , End

Figure 7.17. Microroutine for the instruction Branch<0.

Figure 7.18. Organization of the control unit to allow conditional branching in the microprogram.

Controlstore

Clock

generator

Starting andbranch address Condition

codes

inputsExternal

CW

IR

mPC

Microinstructions A straightforward way to structure microinstructions is to

assign one bit position to each control signal. However, this is very inefficient. The length can be reduced: most signals are not needed

simultaneously, and many signals are mutually exclusive. All mutually exclusive signals are placed in the same group

in binary coding.

Partial Format for the Microinstructions

Further Improvement

Vertical organization:1. each micro instruction specify only a small number of control

functions.2. Slower operating speed due to the need of more micro

instructions to perform the desired function.3. less hardware is needed to handle the execution of

microinstructions. Horizontal organization:1. each micro instruction specify many control signals.2. its useful when higher operating speed is desired and the

machine structure allows parallel use of resources

Microprogram sequencingSeveral operations execute with varying addressing modes• For example, consider ADD r1, r2, r3; ADD (r1), r2,r3; ADD x(r1), r2,

r3; and ADD (r1)+, r2, r3• two disadvantages:Ø Having a separate microroutine for each machine instruction results in

a large total number of microinstructions and a large control store.Ø Longer execution time because it takes more time to carry out the

required branches.§ a separate microroutine for each combination of instruction would

produce considerable duplication of common parts.§ Organize the micro program so that the micro routines share as many

common parts as possible.

Prefetching microinstructions Microprogrammed control leads to a slower operating speed

because of the time it take to fetch microinstructions fromcontrol store.

Avoid this by prefetching the next microinstruction while thecurrent one is executed.

prefetching the microinstruction had some difficulties like thestatus flags and the result of the current microinstruction areneeded to determine the address of next microinstruction.Thus a straight forward prefetching occasionally prefetcheswrong microinstruction.

In this case the fetch must be repeated with correct address.

emulation Allows us to replace obsolete equipment with more up

to date machines. If the replacement computer fully emulates the original

one then no software changes have have to be made torun existing programs.

Facilitates transition to new computer system withminimal disruption.

ADVANTAGES The design of micro-program control unit is less complex

because micro-programs are implemented using softwareroutines.

The micro-programmed control unit is more flexible becausedesign modifications, correction and enhancement is easilypossible.

The new or modified instruction set of CPU can be easilyimplemented by simply rewriting or modifying the contents ofcontrol memory.

The fault can be easily diagnosed in the micro-programcontrol unit using diagnostics tools by maintaining thecontents of flags, registers and counters.

DISADVANTAGES The micro-program control unit is slower than hardwired

control unit. That means to execute an instruction in micro-program

control unit requires more time. The micro-program control unit is expensive than hardwired

control unit in case of limited hardware resources. The design duration of micro-program control unit is more

than hardwired control unit for smaller CPU.

Arithmetic

Signed Numbers Left most bit is sign bit 0 means positive 1 means negative +18 = 00010010 -18 = 10010010

•To compute representation of a negative number faster, find therepresentation of its absolute value in n bits.•invert all bits of this number and add 1 to it.•For example, −10 in 8 bits

10 in 8 bits is 00001010.Invert all bits to get: 11110101.Add 1 to it to get: 11110110.

2’s complement number system

Addition/subtraction of signed numbers

si =ci +1 =

13

7+ Y

1

00010

1

1

00110

11

0

01101001

0

0

0

0

1

1

1

1

00001111

Example:

10= = 0

01 1

11 1 0 0

1

1 1 10

xi yi Carry-in ci Sumsi Carry-outci+1

X

Z+ 6 0+

xiyi

si

Carry-outci+1

Carry-inci

xi yi ci xi yi ci xi yi ci xi yi ci xi yi ciÅ Å=+ + +yi ci xi ci xi yi+ +

At the ith stage:Input:ci is the carry-inOutput:si is the sumci+1 carry-out to (i+1)st

state

Addition logic for a single stage

Full Adder (FA): Symbol for the complete circuit for a single stage of addition.

Full adder(FA)

cici 1+

si

Sum Carry

yixi

ci

yi

xic

i

yi

xi

xi

ci

yi sic

i 1+

n-bit adder•Cascade n full adder (FA) blocks to form a n-bit adder.•Carries propagate or ripple through this cascade, n-bit ripple carry adder.

FA c0

y1x1

s1

FAc 1

y0x0

s0

FAcn 1-

y n 1-xn 1-

cn

sn 1-

Most significant bit(MSB) position

Least significant bit(LSB) position

Carry-in c0 into the LSB position provides a convenient way to perform subtraction.K n-bit adderK n-bit numbers can be added by cascading k n-bit adders.

n-bit c0

ynxn

sn

cn

y0xn 1-

s0

ckn

sk 1-( )n

x0yn 1-y2n 1-x2n 1-ykn 1-

sn 1-

s2n 1-

skn 1-

xkn 1-

addern-bitadder

n-bitadder

Each n-bit adder forms a block, so this is cascading of blocks.Carries ripple or propagate through blocks, Blocked Ripple Carry Adder

n-bit subtractor

FA 1

y1

x1

s1

FA

c1

y0

x0

s0

FA

cn 1-

yn 1-

xn 1-

cn

sn 1-

Most significant bit(MSB) position

Least significant bit(LSB) position

•Recall X – Y is equivalent to adding 2’s complement of Y to X.•2’s complement is equivalent to 1’s complement + 1.•X – Y = X + Y + 1•2’s complement of positive and negative numbers is computedsimilarly.

n-bit adder/subtractor

•Add/sub control = 0, addition.•Add/sub control = 1, subtraction.

Add/Subcontrol

n-bit adder

xn 1- x1 x0

cn

sn 1- s1 s0

c0

yn 1- y1 y0

Detecting overflows Overflows can only occur when the sign of the two operands

is the same. Overflow occurs if the sign of the result is different from the

sign of the operands. Recall that the MSB represents the sign.

xn-1, yn-1, sn-1 represent the sign of operand x, operand y and result srespectively.

Circuit to detect overflow can be implemented by thefollowing logic expressions:

111111 −−−−−− += nnnnnn syxsyxOverflow

1−⊕= nn ccOverflow

Computing the add time

Consider 0th stage:x0 y0

c0c1

s0

FA

•c1 is available after 2 gate delays.•s1 is available after 1 gate delay.

ci

yi

xic

i

yi

xi

xi

ci

yi sic

i 1+

Sum Carry

Computing the add time (contd..)

x0 y0

s2

FA

x0 y0x0 y0

s1

FAc2

s0

FAc1c3c0

x0 y0

s3

FAc4

Cascade of 4 Full Adders, or a 4-bit adder

•s0 available after 1 gate delays, c1 available after 2 gate delays.•s1 available after 3 gate delays, c2 available after 4 gate delays.•s2 available after 5 gate delays, c3 available after 6 gate delays.•s3 available after 7 gate delays, c4 available after 8 gate delays.

For an n-bit adder, sn-1 is available after 2n-1 gate delays ,cn isavailable after 2n gate delays.

Fast additionRecall the equations:

iiiiiii

iiii

cycxyxccyxs++=

⊕⊕=

+1

Second equation can be written as:

iiiiii cyxyxc )(1 ++=+

We can write:

iiiiii

iiii

yxPandyxGwherecPGc

+==+=+1

•Gi is called generate function and Pi is called propagate function•Gi and Pi are computed only from xi and yi and not ci, thus they can becomputed in one gate delay after X and Y are applied to the inputs of an n-bit adder.

Carry lookaheadci+1 = Gi + Pici

ci = Gi−1 + Pi−1ci−1

⇒ ci+1 = Gi + Pi(Gi−1 + Pi−1ci−1)continuing⇒ ci+1 = Gi + Pi(Gi−1 + Pi−1(Gi− 2 + Pi− 2ci−2 ))untilci+1 = Gi + PiGi−1 + PiPi−1Gi−2 + .. + PiPi−1..P1G0 + PiPi−1...P0c0

•All carries can be obtained 3 gate delays after X, Y and c0 are applied.-One gate delay for Pi and Gi-Two gate delays in the AND-OR circuit for ci+1

•All sums can be obtained 1 gate delay after the carries are computed.•Independent of n, n-bit addition requires only 4 gate delays.•This is called Carry Lookahead adder.

Carry-lookahead adder

Carry-lookahead logic

B cell B cell B cell B cell

s3

P3G3

c3

P2G2

c2

s2

G1

c1

P1

s1

G0

c0

P0

s0

.c4

x1

y1

x3

y3

x2

y2 x

0y

0

G i

ci

...

P i s i

x iy i

B cell

4-bit carry-lookahead adder

B-cell for a single stage

C1=G0+P0C0C2=G1+P1G0+P1P0C0C3=G2+P2G1+P2P1G0+P2P1P0C0C4=G3+P3G2+P3P2G1+P3P2P1G0+P3P2P1P0C0

Blocked Carry-Lookahead adder

c4 = G3 + P3G2 + P3P2G1 + P3P2 P1G0 + P3P2 P1P0c0

Carry-out from a 4-bit block can be given as:

Rewrite this as:

P0I = P3P2 P1P0

G0I = G3 + P3G2 + P3P2G1 + P3P2P1G0

Subscript I denotes the blocked carry lookahead and identifies the block.

Cascade 4 4-bit adders, c16 can be expressed as:

c16 = G3I + P3

IG2I + P3

I P2IG1

I + P3IP2

I P10G0

I + P3I P2

I P10P0

0c0

In order to add operands longer than 4 bits, we can cascade 4-bit Carry-Lookahead adders. Cascade of Carry-Lookaheadadders is called Blocked Carry-Lookahead adder.

Blocked Carry-Lookahead adder

Carry-lookahead logic

4-bit adder 4-bit adder 4-bit adder 4-bit adder

s15-12

P3IG3

I

c12

P2IG2

I

c8

s11-8

G1I

c4

P1I

s7-4

G0I

c0

P0I

s3-0

c16

x15-12 y15-12 x11-8 y11-8 x7-4 y7-4 x3-0 y3-0

.

After xi, yi and c0 are applied as inputs:- Gi and Pi for each stage are available after 1 gate delay.- PI is available after 2 and GI after 3 gate delays.- All carries are available after 5 gate delays.- c16 is available after 5 gate delays.- s15 which depends on c12 is available after 8 (5+3)gate delays (Recall that

for a 4-bit carry lookahead adder, the last sum bit is available 3 gate delaysafter all inputs are available)

Multiplication

Multiplication of unsigned numbers

Product of 2 n-bit numbers is at most a 2n-bit number.

Unsigned multiplication can be viewed as addition of shifted versions of the multiplicand.

Multiplication of unsigned numbers (contd..) Rules to implement multiplication are:

If the 0th bit of multiplier is 1, then add the multiplicand with 0 If the 0th bit of multiplier is 0 ,then current partial product is 0. If the ith bit (except 0th bit) of the multiplier is 1, shift the

multiplicand and add the shifted multiplicand to the currentvalue of the partial product.

Hand over the partial product to the next stage If the ith bit of the multiplier is 0, shift the multiplicand and add

the current partial product to the 0.Value of the partial product at the start stage is 0.

Multiplication of unsigned numbers

ith multiplier bit

carry incarry out

jth multiplicand bitBit of incoming partial product (PPi)

Bit of outgoing partial product (PP(i+1))

FA

Typical multiplication cell

Combinational array multiplierMultiplicand

m 3 m 2 m 1 m 00 0 0 0

q 3

q 2

q 1

q0

0

p 2

p 1

p 0

0

0

0

p 3p 4p 5p 6p 7

PP1

PP2

PP3

(PP0)

,

Product is: p7,p6,..p0

Multiplicand is shifted by displacing it through an array of adders.

Combinatorial array multipliers are: Extremely inefficient. Have a high gate count for multiplying numbers of practical

size such as 32-bit or 64-bit numbers. Perform only one function, namely, unsigned integer

product. Improve gate efficiency by using a mixture of combinatorial

array techniques and sequential techniques requiring lesscombinational logic.

Sequential Circuit Multiplier

qn 1-

mn 1-

n-bitAdder

Multiplicand M

Controlsequencer

Multiplier Q

0

C

Shift right

Register A (initially 0)

Add/Noaddcontrol

an 1-

a0

q0

m0

0

MUX

1 1 1 1

1 0 1 1

1 1 1 1

1 1 1 0

1 1 1 0

1 1 0 1

1 1 0 1

Initial configuration

Add

M

1 1 0 1

C

First cycle

Second cycle

Third cycle

Fourth cycle

No add

Shift

ShiftAdd

Shift

ShiftAdd

1 1 1 1

0

0

0

1

0

0

0

1

0

0 0 0 0

0 1 1 0

1 1 0 1

0 0 1 1

1 0 0 1

0 1 0 0

0 0 0 1

1 0 0 0

1 0 0 1

1 0 1 1

QA

Product

Signed Multiplication

Signed Multiplication Considering 2’s-complement signed operands, Extend the sign bit value of the multiplicand to the left as far as the

product will extend (-13)(+11)

1

0

11 11 1 1 0 0 1 1

110

110

1

0

1000111011

000000

1100111

00000000

110011111

13-( )

143-( )

11+( )

Signed Multiplication For a negative multiplier, a straightforward solution is

to form the 2’s-complement of both the multiplier andthe multiplicand and proceed as in the case of apositive multiplier.

This is possible because complementation of bothoperands does not change the value or the sign ofthe product.

A technique that works equally well for both negativeand positive multipliers – Booth algorithm.

Booth AlgorithmMultiplier

Bit i Bit i 1-

V ersion of multiplicandselected by bit

0

1

0

0

01

1 1

0 M

1+ M

1 M

0 M

Booth multiplier recoding table.

X

X

X

X

Booth multiplication with a negative multiplier.

010

1 1 1 1 0 1 10 0 0 0 0 0 0 0 0

000110

0 0 0 0 1 1 01100111

0 0 0 0 0 001000 11111

1

1

0 1 1 0 11 1 0 1 0 6-( )

13+( )X

78-( )

+11- 1-

2's complement ofthe multiplicand

Fast Multiplication

Bit-Pair Recoding of Multipliers Bit-pair recoding halves the maximum number of

summands (versions of the multiplicand).

1+1

(a) Example of bit-pair recoding derived from Booth recoding

0

000

1 1 0 1 0Implied 0 to right of LSB

1

0

Sign extension

1

21

1-

00001 1 1 1 1 00 0 0 0 111 1 1 1 10 00 0 0 0 0 0

0000 111111

0 1 1 0 10

1 0100111111 1 1 1 0 0 1 10 0 0 0 0 0

1 1 1 0 1 1 0 0 1 0

0

1

0 0

1 0

1

0 0

00 1

0

0 1

10

0

0100 1 1 0 1

11

1-

6-( )13+( )

1+

78-( )

1- 2-

´

Figure 6.15. Multiplication requiring only n/2 summands.

i 1+ i 1

(b) Table of multiplicand selection decisions

selected at positioniMultiplicandMultiplier bit-pair

i

0

01

1

1010

1111000

1100

100

1

Multiplier bit on the right

0 0 X M1+

1

1+

01

22+

X MX MX MX MX MX MX M

Carry-Save Addition of Summands CSA speeds up the addition process.

P7 P6 P5 P4 P3 P2 P1 P0

P1 P0P5P7 P6

Consider the addition of many summands, we can:Ø Group the summands in threes and perform carry-save addition

on each of these groups in parallel to generate a set of S and Cvectors in one full-adder delay

Ø Group all of the S and C vectors into threes, and perform carry-save addition on them, generating a further set of S and Cvectors in one more full-adder delay

Ø Continue with this process until there are only two vectorsremaining

Ø They can be added in a RCA or CLA to produce the desiredproduct

Figure 6.17. A multiplication example used to illustrate carry-save addition as shown in Figure 6.18.

100 1 11

100 1 11

100 1 11

11111 1

100 1 11 M

Q

A

B

C

D

E

F

(2,835)

X

(45)

(63)

100 1 11

100 1 11

100 1 11

000 1 11 111 0 00 Product

Figure 6.18. The multiplication example from Figure 6.17 performed using carry-save addition.

00000101 0 10

10010000 1 11 1

+

1000011 1

10010111 0 10 1

0110 1 10 0

00011010 0 00

10001011 1 0 1

110001 1 0

00111100

00110 1 10

11001 0 01

100 1 11

100 1 11

100 1 11

00110 1 10

11001 0 01

100 1 11

100 1 11

100 1 11

11111 1

100 1 11 M

Q

A

B

C

S1C1

D

E

F

S2

C2

S1

C1

S2S3

C3

C2

S4

C4Product

x

Integer Division

Longhand Division Steps Position the divisor appropriately with respect to the

dividend and performs a subtraction. If the remainder is zero or positive, a quotient bit of 1 is

determined, the remainder is extended by another bit ofthe dividend, the divisor is repositioned, and anothersubtraction is performed.

If the remainder is negative, a quotient bit of 0 isdetermined, the dividend is restored by adding back thedivisor, and the divisor is repositioned for anothersubtraction.

001111

Division of Unsigned Binary Integers

1011

00001101

10010011

1011

0011101011

1011

100

Quotient

Dividend

Remainder

PartialRemainders

Divisor

Circuit Arrangement

Figure 6.21. Circuit arrangement for binary division.

qn-1

Divisor M

ControlSequencer

Dividend Q

Shift left

N+1 bit adder

q0

Add/Subtract

QuotientSetting

A

m00 mn-1

a0an an-1

Restoring Division Shift A and Q left one binary position Subtract M from A, and place the answer back in A If the sign of A is 1, set q0 to 0 and add M back to A

(restore A); otherwise, set q0 to 1 Repeat these steps n times

Examples

10111

Figure 6.22. A restoring-division example.

1 1 1 1 1

01111

0

0

0

1

000

0

0

00

0

00

00

0

1

00

0

00

1 01

111 1

010001

SubtractShift

Restore

1 00001 00001 1

Initially

SubtractShift

10111100001100000000

SubtractShift

Restore

101110100010000

1 1

QuotientRemainder

Shift10111

1 0000Subtract

Second cycle

First cycle

Third cycle

Fourth cycle

00

0

0

00

1

01

10000

1 11 0000

11111Restore

q0Set

q0Set

q0Set

q0Set

Nonrestoring Division Avoid the need for restoring A after an

unsuccessful subtraction. Step 1: (Repeat n times)Ø If the sign of A is 0, shift A and Q left one bit position and

subtract M from A; otherwise, shift A and Q left and addM to A.

Ø Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0. Step2: If the sign of A is 1, add M to A

Examples

A nonrestoring-division example.

Add

Restoreremainder

Remainder

0 0 0 01

1 1 1 1 10 0 0 1 1

1

Quotient

0 0 1 01 1 1 1 1

0 0 0 01 1 1 1 1

Shift 0 0 01100001111

Add

0 0 0 1 10 0 0 0 1 0 0 01 1 1 0 1

ShiftSubtract

Initially 0 0 0 0 0 1 0 0 0

1 1 1 0 0000

1 1 1 0 00 0 0 1 1

0 0 0ShiftAdd

0 0 10 0 0 011 1 1 0 1

ShiftSubtract

0 0 0 110000

Fourth cycle

Third cycle

Second cycle

First cycle

q0Set

q0Set

q0Set

q0Set

Floating-Point Numbersand

Operations

FractionsIf b is a binary vector, then we have seen that it can be interpreted as an unsignedinteger by:

V(b) = b31.231 + b30.230 + bn-3.229 + .... + b1.21 + b0.20

This vector has an implicit binary point to its immediate right:

b31b30b29....................b1b0. implicit binary point

Suppose if the binary vector is interpreted with the implicit binary point is just left ofthe sign bit:

implicit binary point .b31b30b29....................b1b0

The value of b is then given by:

V(b) = b31.2-1 + b30.2-2 + b29.2-3 + .... + b1.2-31 + b0.2-32

Range of fractionsThe value of the unsigned binary fraction is:

V(b) = b31.2-1 + b30.2-2 + b29.2-3 + .... + b1.2-31 + b0.2-32

The range of the numbers represented in this format is:

In general for a n-bit binary fraction (a number with an assumed binary point atthe immediate left of the vector), then the range of values is:

9999999998.021)(0 32 ≈−≤≤ −bV

nbV −−≤≤ 21)(0

Scientific notation•Previous representations have a fixed point. Either the point is to the immediateright or it is to the immediate left. This is called Fixed point representation.•Fixed point representation suffers from a drawback that the representation can onlyrepresent a finite range (and quite small) range of numbers.

A more convenient representation is the scientific representation, wherethe numbers are represented in the form:

x = m1.m2m3m4 × b±e

Components of these numbers are:

Mantissa (m), implied base (b), and exponent (e)

Significant digitsA number such as the following is said to have 7 significant digits

x = ±0.m1m2m3m4m5m6m7 × b±e

Fractions in the range 0.0 to 0.9999999 need about 24 bits of precision (in binary). For example the binary fraction with 24 1’s:

111111111111111111111111 = 0.9999999404

Not every real number between 0 and 0.9999999404 can be represented by a 24-bitfractional number.The smallest non-zero number that can be represented is:

000000000000000000000001 = 5.96046 x 10-8

Every other non-zero number is constructed in increments of this value.

Sign and exponent digits•In a 32-bit number, suppose we allocate 24 bits to represent a fractionalmantissa.•Assume that the mantissa is represented in sign and magnitude format,and we have allocated one bit to represent the sign.•We allocate 7 bits to represent the exponent, and assume that theexponent is represented as a 2’s complement integer.•There are no bits allocated to represent the base, we assume that thebase is implied for now, that is the base is 2.•Since a 7-bit 2’s complement number can represent values in the range64 to 63, the range of numbers that can be represented is:

0.0000001 x 2-64 < = | x | <= 0.9999999 x 263

•In decimal representation this range is:

0.5421 x 10-20 < = | x | <= 9.2237 x 1018

A sample representation

Sign Exponent Fractional mantissabit

1 7 24

•24-bit mantissa with an implied binary point to the immediate left•7-bit exponent in 2’s complement form, and implied base is 2.

Normalization

If the number is to be represented using only 7 significant mantissa digits,the representation ignoring rounding is:

Consider the number: x = 0.0004056781 x 1012

x = 0.0004056 x 1012

If the number is shifted so that as many significant digits are brought into 7 available slots: x = 0.4056781 x 109 = 0.0004056 x 1012

Exponent of x was decreased by 1 for every left shift of x.

A number which is brought into a form so that all of the available mantissa digitsare optimally used (this is different from all occupied which may not hold), iscalled a normalized number.

Same methodology holds in the case of binary mantissas

0001101000(10110) x 28 = 1101000101(10) x 25

Normalization (contd..)•A floating point number is in normalized form if the most significant 1 in themantissa is in the most significant bit of the mantissa.•All normalized floating point numbers in this system will be of the form:

0.1xxxxx.......xx

Range of numbers representable in this system, if every number must benormalized is:

0.5 x 2-64 <= | x | < 1 x 263

Normalization, overflow and underflowThe procedure for normalizing a floating point number is:

Do (until MSB of mantissa = = 1)Shift the mantissa left (or right)Decrement (increment) the exponent by 1

end do

Applying the normalization procedure to: .000111001110....0010 x 2-62

gives: .111001110........ x 2-65

But we cannot represent an exponent of –65, in trying to normalize the number we have underflowed our representation.

Applying the normalization procedure to: 1.00111000............x 263

gives: 0.100111..............x 264

This overflows the representation.

Changing the implied baseSo far we have assumed an implied base of 2, that is our floating point numbers are of the form:

x = m 2e

If we choose an implied base of 16, then:

x = m 16e

Then:

y = (m.16) .16e-1 (m.24) .16e-1 = m . 16e = x

•Thus, every four left shifts of a binary mantissa results in a decrease of 1 in a base16 exponent.•Normalization in this case means shifting the mantissa until there is a 1 in the firstfour bits of the mantissa.

Excess notation•Rather than representing an exponent in 2’s complement form, it turns out to bemore beneficial to represent the exponent in excess notation.•If 7 bits are allocated to the exponent, exponents can be represented in the rangeof -64 to +63, that is:

-64 <= e <= 63Exponent can also be represented using the following coding called as excess-64:

E’ = Etrue + 64

In general, excess-p coding is represented as:

E’ = Etrue + p

True exponent of -64 is represented as 00 is represented as 64

63 is represented as 127

This enables efficient comparison of the relative sizes of two floating point numbers.

IEEE notationIEEE Floating Point notation is the standard representation in use. Thereare two representations:

- Single precision.- Double precision.

Both have an implied base of 2.Single precision:

- 32 bits (23-bit mantissa, 8-bit exponent in excess-127 representation)Double precision:

- 64 bits (52-bit mantissa, 11-bit exponent in excess-1023representation)Fractional mantissa, with an implied binary point at immediate left.

Sign Exponent Mantissa1 8 or 11 23 or 52

Peculiarities of IEEE notation•Floating point numbers have to be represented in a normalized form tomaximize the use of available mantissa digits.•In a base-2 representation, this implies that the MSB of the mantissa isalways equal to 1.•If every number is normalized, then the MSB of the mantissa is always 1.

We can do away without storing the MSB.•IEEE notation assumes that all numbers are normalized so that the MSB ofthe mantissa is a 1 and does not store this bit.•So the real MSB of a number in the IEEE notation is either a 0 or a 1.•The values of the numbers represented in the IEEE single precisionnotation are of the form:

(+,-) 1.M x 2(E - 127)

•The hidden 1 forms the integer part of the mantissa.•Note that excess-127 and excess-1023 (not excess-128 or excess-1024)are used to represent the exponent.

Exponent fieldIn the IEEE representation, the exponent is in excess-127 (excess-1023) notation. The actual exponents represented are:

-126 <= E <= 127 and -1022 <= E <= 1023not-127 <= E <= 128 and -1023 <= E <= 1024

This is because the IEEE uses the exponents -127 and 128 (and -1023 and 1024), that is the actual values 0 and 255 to represent special conditions:

- Exact zero- Infinity

Floating point arithmeticAddition:

3.1415 x 108 + 1.19 x 106 = 3.1415 x 108 + 0.0119 x 108 = 3.1534 x 108

Multiplication:3.1415 x 108 x 1.19 x 106 = (3.1415 x 1.19 ) x 10(8+6)

Division:3.1415 x 108 / 1.19 x 106 = (3.1415 / 1.19 ) x 10(8-6)

Biased exponent problem:If a true exponent e is represented in excess-p notation, that is as e+p.Then consider what happens under multiplication:

a. 10(x + p) * b. 10(y + p) = (a.b). 10(x + p + y +p) = (a.b). 10(x +y + 2p)

Representing the result in excess-p notation implies that the exponentshould be x+y+p. Instead it is x+y+2p.Biases should be handled in floating point arithmetic.

Floating point arithmetic: ADD/SUB rule Choose the number with the smaller exponent. Shift its mantissa right until the exponents of both the

numbers are equal. Add or subtract the mantissas. Determine the sign of the result. Normalize the result if necessary and truncate/round

to the number of mantissa bits.

Note: This does not consider the possibility of overflow/underflow.

Floating point arithmetic: MUL rule Add the exponents. Subtract the bias. Multiply the mantissas and determine the sign of the

result. Normalize the result (if necessary). Truncate/round the mantissa of the result.

Floating point arithmetic: DIV rule Subtract the exponents Add the bias. Divide the mantissas and determine the sign of the

result. Normalize the result if necessary. Truncate/round the mantissa of the result.

Note: Multiplication and division does not require alignment of the mantissasthe way addition and subtraction does.

Guard bitsWhile adding two floating point numbers with 24-bit mantissas, we shiftthe mantissa of the number with the smaller exponent to the right untilthe two exponents are equalized.This implies that mantissa bits may be lost during the right shift (that is,bits of precision may be shifted out of the mantissa being shifted).To prevent this, floating point operations are implemented by keepingguard bits, that is, extra bits of precision at the least significant endof the mantissa.The arithmetic on the mantissas is performed with these extra bits ofprecision.After an arithmetic operation, the guarded mantissas are:

- Normalized (if necessary)- Converted back by a process called truncation/rounding to a 24-bitmantissa.

Truncation/rounding Straight chopping:

The guard bits (excess bits of precision) are dropped.

Von Neumann rounding: If the guard bits are all 0, they are dropped. However, if any bit of the guard bit is a 1, then the LSB of the retained bit

is set to 1.

Rounding: If there is a 1 in the MSB of the guard bit then a 1 is added to the LSB of

the retained bits.

Rounding Rounding is evidently the most accurate truncation

method. However,

Rounding requires an addition operation. Rounding may require a renormalization, if the addition operation de-

normalizes the truncated number.

IEEE uses the rounding method.

0.111111100000 rounds to 0.111111 + 0.000001=1.000000 which must be renormalized to 0.100000

Thank You

compuer organizaion processing unit

Education