chapter 8 problems prof. sin-min lee department of mathematics and computer science

Chapter 8 Problems

Prof. Sin-Min Lee

Department of Mathematics and Computer Science

Addition And Subtraction

• For both the non-negative and two’s complement notations, addition and subtraction are fairly straightforward.

• However, when the result cannot be represented as an 8-bit value, a problem arises. For the non-negative notation, consider the addition 255+1, 1111 1111 + 0000 0001. Straight binary addition yields the result 1 0000 0000, a 9-bit value, which cannot be stored in an 8-bit register.

Overflow

• Arithmetic overflow: The extra bit generates a carry out from the parallel adder. In non-negative notation, this carry bit can set an overflow flag, signaling the rest of the system that an overflow has occurred, and that the result generated is not entirely correct. The rest of the system can either fix the result of handle the error appropriately.

Overflow

• In two’s complement notation an overflow can occur at either end of the numeric range. At the positive end, adding 127+1, 0111 1111 + 0000 0001, yields a result of 1 0000 0000. However, in two’s complement notation, this is -128, not the desired value of +128. In this notation, the key to recognizing overflow is to check not only the carry out, but also the carry in to the most significant bit of the result. If two carries are equal, then there is no overflow.

Overflow

• Overflow only occurs when two numbers with the same sign are added. Adding two numbers with different signs always produces a valid result.

Overflow generation in unsigned two’s complement addition

• 126 01111110

• +1 00000001

• 01111111

• 00

• 127 01111111

• +1 00000001

• 10000000

• 01

• -127 10000001

• +(-1) 11111111

• 10000000

• 11

• -128 10000000

• +(-1) 11111111

• 01111111

• 10

Whether there is an overflow or not depends on the interpretation (signed or unsigned number).

each partial product is the product of one bit of the multiplier times the multiplicand. There are as many partial products as there are bits in the multiplier, and there are as many bits in each partial product as there are bits in the multiplicand.

In looking for an algorithmic statement of this approach to binary multiplication, it was found that a group of Russian peasants used precisely this method to multiply decimal numbers, and as a result, the binary multiplication algorithm given here is commonly known as the Russian Peasant Algorithm:

The best of these fast multiplication algorithms was developed by HP for the HP PA RISC architecture. In developing the first generation of this architecture, the HP designers concluded that a hardware multiply instruction was not justified, because most multiplies are multiplies by constants and can be replaced by add-and-shift instructions (as on the Hawk) and because the rare multiply of one variable by another could be done quickly enough in software.

The HP PA RISC multiply algorithm is based on the following notions:

1. First, do the multiply in base 16, so each step involves multiplying the multiplicand by the least significant 4 bits of the multiplier and then adding the partial product to the result.

2. To multiply the multiplicand by one hex digit of the multiplier, use an efficient case/select control structure to select one of 16 different blocks of code for multiplying by a constant. Each of these blocks will be only a few instructions long because of the use of add-and-shift instructions.

3. To avoid the cost of loop control instructions, note that the fixed word size implies that each multiply can be done in a fixed number of iterations (4 hex digits in a 16 bit multiplier, or 8 hex digits in a 32 bit multiplier). So, just write out the body of the loop 4 or 8 times in series and eliminate any need for loop counters.

What are BCD Numbers?

• Binary Coded Decimal numbers are actually binary numbers that are "coded" to represent decimal numbers

• Coded numbers are "not real numbering systems". In fact, they are just what they say they are, "codes that represent actual numbers". Although the actual numbers can be mathematically manipulated, codes follow no such rules.

Example 1 BCD (365)_10 -------------> 0011 | 0110 | 0101

Example 2.

Addition

• Addition is analogous to decimal addition with normal binary addition taking place from right to left. For example,

Where the result of any addition exceeds 9(1001) then six (0110) must be added to the sum to account for the six invalid BCD codes that are available with a 4-bit number. This is illustrated in the example below

• When one adds two BCD digits,

• if the binary sum is less than 1010, the corresponding BCD sum digit is correct.

• if the binary sum is greater than or equal to 1010, add (0110)2 to the corresponding

BCD sum digit and produce a carry.

8.4.1 Pipelining1. Data enters a stage of the pipeline, and it

will through go through different stages

of arithmetic operation till the final

computation is completed.

Note: Each stage only perform its specific

function.

2. Pipeline improves the performance by

overlapping computation; that is, each stage

operate on different data simultaneously.

Compare process time between Pipeline and Non-Pipeline

Consider this code:

For i = 1 to 100 Do {A[i] (B[i] * C[i]) + D[i]}

(assume multiplication & Addition require 10ns)

Non-pipeline: 10ns * 2 * 100 = 2000ns

10ns Stage 1: B[1] * C[1]10ns Stage 2: Add above

value and D[1] toA[1]Stage 1: B[2] * C[2]

10ns Stage 2: CalculateA[2]Stage 1: B [3] * C[3]

An Example on Arithmetic Pipeline

• Consider the following snippet code:

FOR I = 1 to 100 DO {A[I](B[I] . C[I]) + D[I]}

• Assume that each operation, multiplication and addition, requires 10 ns to complete. A non-pipelined uniprocessor take 20 ns to calculate A[I], and 2000 ns to execute the code.

• A pipeline unit could break this computation into two stages as shown in the next slide.

A two-stage pipeline to implement the program loop

B[i]

C[i]

CLK

D[i] CLK

Stage 1 (*) Stage 2 (+)

* Latch

Latch

+ Latch

Example: A two-stage pipeline to implement the program loop

During the first 10 ns, the first stage calculates B[1].C[1]. In the next 10 ns, stage 2 adds this value to D[1]and stores the result in A[1]. At the same time, stage 1 multiplies B[2] and C[2]. During the following 10 ns, stage 1 forms B[3].C[3] and stage 2 calculate the final value A[2]. Instead of 2000 ns which a non-pipelined uniprocessor would take, this pipeline executes the code in 1010 ns.

Speedup for PipelineA pipeline’s speedup S is the time needed to process n pieces of data using non-pipeline

(T1), divided by the time needed to process same data using k-stage pipeline (Tk).

Speedup of pipeline can expressed as:

n

Sn =nT1

(k + n -1) Tk

Continue:

Apply Speedup expression using previous example:

S100 =100 * 20ns

(2 + 100 - 1) * 10ns

= 1.98

T1, is the time toto process n piecesdata using non-pipeline (* and +).

n

Tk

k = Two stages (* and +)

Speed up

• The most popular metrics used to measure the performance of a pipeline are throughput (the number of results generated per time unit) and speedupspeedup.

• A pipeline’s speedup, Sn, is the amount of time needed to process n pieces of data using a non-pipelined arithmetic unit, divided by the time need to process the same data using a k-stage pipeline.

Sn = nT1 / (k + n - 1) x Tk

Calculating Speedup

Sn = nT1 / (k + n -1) x Tk

T1 : the time required to calculate using non-pipeline

Tk : is the the clock period of the k-stage pipeline

The pipeline unit requires k time units, each of duration Tk, to move the first piece of data to the pipeline. Using the last example, the speedup can be calculated as followed:

S100 = 100x20ns / (2 + 100 - 1) x 10ns

= 1.98

As n approaches infinity, n / (k + n - 1) approaches 1, so S = T1 / Tk

The maximum speedup occurs when each stage has the same delay.

8.4.2 Lookup Tables1. Theoretically, any combinatorial circuit can

be implemented by a ROM configured as a

lookup table.

2. The ROM is programmed with data such

that the correct values are output for any

possible input values.

3. A 4x1 ROM is like a two-input AND gate.

The input of the AND gate serves as the

address input for ROM, and the output of

ROM corresponds to the AND gate output.

Lookup ROM equivalent to AND gate

A1

ROM D0

A0

Address Data 0 0 0 0 1 0 1 0 0 1 1 1

x & y

x

y

AND x & yx

y

4 x 1

All possible And gate inputs are stored in ROM

chapter 8 problems prof. sin-min lee department of mathematics and computer science

Documents

overflow generation

valid result

result generated

overflowarithmetic overflow

overflow flag

nonnegative notation

unsigned twos

straight binary addition