kim - a carry skip adder with logic level optimization

A Carry Skip Adder with Logic Level Optimization

Kim, Kwang Yoal

University of Rostock

Electrical Engineering and Information Technology

Rostock, Germany

1. Abstract Addition is the most commonly used arithmetic operation that it often is the speed-limiting element. Therefore, careful optimization of the adder is of the utmost importance. This optimization can be done either in the logic or circuit level way. Circuit optimizations manipulate transistor sizes and circuit topology to optimize the speed. On the other hand, logic-level optimization tries to rearrange the Boolean equations so that a faster or smaller even less power consumption circuit is obtained. We here take the Carry Skip Adder for our example and provide a short summary of the basic definition of the adder circuit as well as consider optimization processes, especially high performance. 2. Introduction Several adder implementations, including ripple carry, Manchester carry chain, carry skip, carry look-ahead, carry select, conditional sum, and various parallel prefix adders are available to satisfy different area, delay, and power requirements. With many studies, ripple carry and Manchester carry chain adders are the simplest, but slowest adders with O(n) area and O(n) delay, where n is the operand size in bits. Carry look-ahead, conditional sum, and parallel prefix adders have O(n·log(n)) area and O(log(n)) delay, but typically suffer from irregular layout. On the other hand, carry skip adder, which has O(n) area and O(√n) delay provides a good compromise in terms of area and delay, along with a simple and regular layout. Carry skip adders also dissipate less power than other adders due to their low transistor counts and short wire lengths.

3. Carry Skip Adder

A ripple-carry adder, as mentioned above, is the simplest so that it is easy to design but is only practical for the implementation of additions with a relatively small word

length because the linear dependence of the adder speed on the number of bits makes the usage of the ripple-carry adder rather impractical; since the carry bit “ripple” from one stage to the other, the delay through the circuit depends on the number of logic stages that must be traversed and is a function of the applied input signals.

As in a ripple-carry adder, every full adder cell has to wait for the incoming carry before an outgoing carry can be generated. This dependency can be eliminated by introducing an additional bypass (skip) to speed up the operation of the adder. An incoming carry Ci,0=1 propagates through complete adder chain and causes an outgoing carry C0,3=1 under the conditions that all propagation signals are 1. This information can be used to speed up the operation of the adder, as shown Fig 2. When BP = P0P1P3P4 = 1, the incoming carry is forwarded immediately to the next block through the bypass and if it is not the case, the carry is obtained via the normal route.

if (P0P1P3P4 = 1) then C0,3 = Ci,0 else either Delete or Generate occurred. (Eq.1)

Fig.1 Carry skip adder structure – basic concept Within each block, as illustrated above, a simple 4-bit full adder structure is realized, where the propagated and generated signals for the respective input bits are used to form the outputs sum bits and the output carries.

The multiplexer at the end of a block allows the input carry to bypass the block when all of the propagate signals in that block are asserted. After the carry-generated delay of the first block, the bypassing of carries in subsequent blocks results in the carry-propagate delay. If any of the propagate signals in some block is unasserted, then the carry propagation is not dependent on any of the input carries from the previous blocks and therefore the block computes in parallel with other blocks. However, if all the propagate bits are asserted in the last block, sum generation must wait till the carry is propagated in from previous blocks resulting in carry-absorb delay. Therefore, the critical path delay of the carry skip adder, illustrated in gray in Fig.2, comprises that the carry is generated at the first bit position, ripples through the first block, skips around (N/M-2) bypass stage, and is consumed at the last bit position without generating an output carry.

Fig.2 16-bit Carry skip adder and worst-case delay path

t p = t setup + M t carry + (N/M-2) t bypass + (M-1) t carry + t sum (Eq.2)

t p : total delay t setup : the fixed overhead time to create the generate and propagate signals t carry : the propagation delay through a signal bit. t bypass: the propagation delay through the bypass multiplexer of a single stage. t sum : the time to generate the sum of the final stage. N : total bit M : block bit

The carry skip adder is unfortunately still linear in the number of bits N. Despite of its linear feature, the slope of the delay function increases in a more gradual fashion than the ripple-carry adder. Actually the ripple carry adder is faster for small values of N. However the industrial demands these days, which most desktop computers use word lengths of 32 bits, even longer for severs and multimedia processors, makes the carry skip structure more interesting. The crossover point between the ripple-carry adder and the carry skip adder is dependent on technology considerations and is normally situated 4 to 8 bits. Fig.3 Propagation delay of ripple carry adder versus carry skip adder 3.1 Block Size Optimization As mentioned above, the critical path delay of the carry skip adder comprises the carry-generate, carry-propagate and carry-absorb delays. The optimal number of bits per skip block is determined by various technology parameters such as the extra delay of the bypass selecting multiplexter, the buffering requirements in the carry chain, and the ratio of the delay through the ripple and the bypass paths. However we may compute the optimal block size of the carry skip adder with simplifying assumptions.

If we assume that one stage of ripple (tcarry) has the same delay as one skip logic (tbypass) and both are 1, then from (Eq.2) it can be followed.

t p = 1 + M + (N/M-2) + (M-1) + 1 = (N/M) + 2M - 1 (Eq.3)

Here, we can derive the optimal block size M from the derivations of d tp/ d M.

dtp / dM = 0 and Mopt = √(N/2) (Eq.4) And the optimal time with fixed block carry skip adder is

t p(opt) = 2(√(2N)) - 1 (Eq.5) 3.2 Variable Block-Sizes Carry skip adder

Although the carry-skip adder works faster than a ripple carry adder, upon closer inspection, we notice that if all the skip blocks are of the same size, the latter blocks will finish switching quickly and then sit idle for a while waiting for the carry signal to pass through all the bypass multiplexors. For example, in the diagram of a 32-bit carry-skip adder below, bits the carry-out for bits 4-7 will be ready at the same time as the carry-out for bits 0-3. This second block will wait while the first multiplexor does its job. Therefore, logic optimization is crucial.

The optimization for the carry skip adder can be done by varying bypass group sizes. In other word, to speed up the circuit, we could vary the size of the skip block. Intuitively, we should then be able to reduce the size of the first skip block and make each subsequent block increasingly larger. Because the critical path includes the last skip block, we must also start to taper down the size of each block as we approach the end.

Assuming of easy of analysis, t mux is equal to 2 * t prop, the setup (creation of

propagate and generate signals) takes t setup, each bit of carry propagation takes t prop (i.e. a skip block of m bits has a delay of m*t prop), a MUX has a delay of t mux, and the sum generation has a delay of t sum.

a) fixed size carry skip adder

b) variable block size carry skip adder Fig. 4 fixed size carry skip adder versus variable block size carry skip adder We may estimate the worst-case delay for the simple 32-bit carry skip adder and then estimate the amount of delay improvement with this new variable-block scheme in Fig.4.

t p_fixed = t setup + (2)(M) t prop + (7)t mux + t sum = t setup + (8) t prop + (7) t mux + t sum (Eq.6)

t p_variable = t setup + (2)(Mmax t prop) + t mux + t sum = t setup + (2) t prop + (7)t mux + t sum (Eq.7)

Briefly, we notice that variable block sized carry skip adder is much faster than the fixed carry skip adder because the next block does not need to wait until the previous block finishes its works. If the carry skip adder makes the consecutive groups gradually larger, a progressive increase in stage sizes eventually increases the delay. Thus, variable size carry skip adder should be tampered down to the end.

3.3 Multi-level Carry skip adder We may also optimize the carry skip adder by skipping over several skip blocks at once. There is also an assumption simplifying the analysis that the ripple and skip delays are equal. We thus equate the carry skip adder delay with worst case sum of the number of ripple stages and the number of skip stages.

Figure 4. depicts a two level carry skip adder and the signal controlling this second skip level logic is derived as the logical AND of the first level skip signals. A carry that would need 3 time units to skip these three blocks in a single level skip adder can now do the same work in a single time unit.

Fig.5 2-level carry skip adder structure

4. Conclusion

Generally, Carry skip adder is a fast adder compared to ripple carry adder when addition a large number of bits; carry skip adder has O(√n) delay provides a good compromise in terms of delay, along with a simple and regular layout. But it has still some drawbacks; it is still linear fashion although it is significantly improved in performance (the ripple adder is actually faster for small values of N) and the overhead of the extra bypass multiplexer makes the carry skip adder less interesting.

For better performance, several studies have also been performed to reduce the delay of carry-skip adders. 1) Techniques select variable block sizes to minimize the delay of adders that use a single level of carry skip logic. 2) Techniques allow multiple levels of skip logic, which further reduces delay at the cost of an increase in area and less regular layout. However, the preceding analyses are based on a number of

simplifying assumptions; skip and ripple delays are equal and ripple delay is assumed to be linearly proportional to the block width. These may not be true in practice. Thus, if these assumptions are relaxed, the problem may no longer lend itself to analytical solution.

The exact optimal configuration is rather highly technology dependent that we should consider all dominant factors for better optimization. For example, as mentioned earlier, the optimal number of bits per skip block is determined by several technology parameters such as the extra delay of the bypass selecting multiplexer, the buffering requirements in the carry chain, and the ratio of the delay through the ripple and the bypass paths. References [1] Jan M.Rabey “Digital Integrated Circuits”, Prentice Hall [2] Behrooz Parhami, “Computer Arithmetic-Algorithms and hardware designs” ,

Oxford university press [3] Eric Chung, Zohir Hyder, Design and Comparison of 64-bit Re-configurable Adders,

University of California – Berkeley [4] Kai Chira, Michael Schulte, A static low-power, high-performance 32-bit Carry skip

adder , IEEE press [5] Shawn Nicholl, Low-power, high-speed multiplier Architechures [6] Mary Jane Irwin “VLSI Digital Circuits” CSE477 Purdue University [7] B.Nicolic “Advanced Digital Integrated Circuits” EE241 University of California at

Berkeley

kim - a carry skip adder with logic level optimization

Documents