design of weighted modulo 2n + 1 adder using diminished-1 adder with the correction circuits

Abstract - In this brief, we proposed improved area-

efficient weighted modulo 2n+1 adder. This is achieved by

modifying existing diminished-1 modulo 2n+1 adder to

incorporate simple correction schemes. Our proposed

adders can produce modulo sums within the range {0,2n},

which is more than the range {0,2n−1} produced by exist-

ing diminished-1 modulo 2n+1 adders. We have imple-

mented the proposed adders using 0.13-μm CMOS tech-

nology, and the area required for our adders is lesser

than previously reported weighted modulo 2n+1 adders

with the same delay constraints.

Key-words: modulo 2n+1 adder, residue number system

(RNS), VLSI design.

I. INTRODUCTION

The residue number system (RNS) [1] has been employed

for efficient parallel carry-free arithmetic computations

(addition, subtraction, and multiplication) in DSP applica-

tions as the computations for each residue channel can inde-

pendently be done without carry propagation. Since RNS

based computations can achieve significant speedup over the

binary-system-based computation, they are widely used in

DSP processors, FIR filters, and communication components

[2]–[4].

Arithmetic modulo 2n + 1 computation is one of the most

common RNS operations that are used in pseudorandom

number generation and cryptography. The modulo 2n + 1ad-

dition is the most crucial step among the commonly used

moduli sets, such as {2n − 1, 2n, 2n + 1}, {2n − 1, 2n, 2n + 1,

22n + 1}, and {2n − 1, 2n, 2n + 1, 2n+1 + 1}. There are many

previously reported methods to speed up the modulo 2n + 1

addition. Depending on the input/output data representations,

these methods can be classified into two categories, namely,

diminished-1 [5]–[7] and weighted [10], [11], respectively.

In the diminished-1 representation, each input and output

operand is decreased by 1 compared with its weighted repre-

sentation. Therefore, only n-bit operands are needed in di-

minished-1 modulo 2n + 1 addition, leading to smaller and

faster components. However, this incurs an overhead due to

the translators from/tothe binary weighted system. On the

other hand, the weighted-1 representation uses (n+1)-bit op-

erands for computations, avoiding the overhead of transla-

tors, but requires larger area compared with the diminished-1

representations.

The general operations in modulo 2n + 1 addition

were discussed in , including diminished-1 and weighted

modulo addition. In [6] and [7], the authors proposed effi-

cient parallel-prefix adders for diminished-1 modulo 2n+1

addition. To improve the area–time and time–power prod-

ucts, the circular carry selection scheme was used to effi-

ciently select the correct carry-in signals for final modulo

addition [9]. The aforementioned methods all deal with di-

minished-1 modulo addition. However, the hardware for

decreasing/increasing the inputs/outputs by 1 is omitted in

the literature. In addition, the value zero is not allowed in

diminished-1modulo 2n+1 addition, and hence, the zero-

detection circuit is required to avoid incorrect computation.

This leads to increased hardware cost, which was not consid-

ered in the designs proposed in [6]–[9].

In [10], the authors proposed a unified approach for

weighted and diminished-1 modulo 2n+1 addition. This ap-

proach is based on making the modulo 2n+1addition of two

(n+1)-bit input numbers A and B congruent to Y+U+1, where

Y and U are two n-bit numbers. Thus, any dimished-1 adder

can be used to perform weighted modulo 2n+1 addition of Y

and U. In [11], the authors first used the translators to de-

crease the sum of two n-bit inputs A and B by 1 and then

performed the weighted modulo 2n+1 addition using dimin-

ished-1 adders. It should be noted that, for the architecture in

[11], the ranges of two inputs A and B are less than that pro-

posed in [10] (i.e., {0, 2n − 1} versus {0, 2n }).

In this brief, we propose improved area-efficient

weighted modulo 2n+1 adder design using diminished-1 ad-

ders with simple correction schemes. This is achieved by

subtracting the sum of two (n + 1)-bit input numbers by the

constant 2n+1 and producing carry and sum vectors. The

modulo 2n+1 addition can then be performed using parallel-

prefix structure diminished-1 adders by taking in the sum

and carry vectors plus the inverted end-around carry with

simple correction schemes. Compared with the work in [10],

the area cost for our proposed adders is lower. In addition,

our proposed adders do not require the hardware for zero

detection that is needed in diminished-1modulo 2n+1 addi-

tion. Synthesis results show that our proposed adders are

comparable to the work proposed in [10] and [11].

The rest of this brief is organized as follows. In

section II, we will review the design of two previous

weighted modulo 2n+1adders. Our proposed area-efficient

weighted modulo2n+1 adder is presented in Section III. The

synthesis results and comparisons are given in Section IV,

and Section V concludes this brief.

II. RELATED WORK

Given two (n+1)-bit numbers A and B, where 0 ≤ A,B ≤ 2n ,

the values of diminished-1 of A and B are denote by A*=A−1

Design of Weighted Modulo 2n + 1 Adder Using Diminished-1 adder with the

correction circuits

1V. Chandrasekhar, 2D. Maruthi kumar 1M. Tech student, Department of ECE, SKTRMCE, Kondair, India.

2M. Tech, Department of ECE, SRIT, Anantapur, India.

e-mail: [email protected], [email protected]

International Journal of Systems , Algorithms &

Applications I J S A A

Volume 2, Issue 2, February 2012, ISSN Online: 2277-2677 8

and B*=B−1, respectively. The diminished-1 sum S* can be

computed by

S*=|S −1|2n+1=|A+B−1|2

n+1=|A*+B*|2

n+ cout………(1)

where |X|Z is defined as modulo Z of X, and cout is denoted as

the inverted end-around carry of the diminished-1 modulo 2n

sum of n-bit A* and B*.

A. Vergos and Efstathiou [10]

In [10], the authors first compute the congruent mod-

ulo sum of A + B to produce Y and U, and then, the final mod-

ulo sum is performed by any diminished-1 modulo adder as

follows:

Suppose A and B are two (n+1)-bit input numbers, i.e., A =

anan−1, . . . , a0 = an × 2n + An and B = bnbn−1, . . . , b0 = bn × 2n

+ Bn, where 0≤ A,B≤2n, and An and Bn are two n-bit numbers;

then

|A+B|2n

+1 = ||An+Bn+D+1|2n

+1+ 1|2n

+1 =|Y+U+1|2n+1…… (2)

In (2),D=2n−4+2cn+1+sn, which is equivalent to 1111. . .

cn+1sn, where cn+1 = an bn ( is denoted as the logic AND opera-

tion), and sn = an^ bn (^ is denoted as the logic EXCLUSIVE-

OR operation) is the bit of D with binary weights 21 and 20,

respectively. The first step of (2) computes modulo 2n+1

carry-save addition, giving the carry vector Y and the sum

vector U, where Y =yn−2,yn−3, . . . , y0 (~yn−1 )and U = un−1un−2, . .

. , u0 are produced by adding An, Bn, and D, respectively. It can

be seen that the values of D with binary weights of 22 through

2n−1 are all 1, which can simplify the design of adders to pro-

duce the carries and sums using OR and XNOR gates for eve-

ry bit position directly [denoted FA+ in Fig. 1(a)]. In the bits

of D with binary weights 21 and 20, the adders should be mod-

ified to accept the values sn and cn+1, respectively, as shown in

Fig. 1(b).

B. Vergos and Bakalis [11]

In [11], the authors subtract the sum of the two n-bit

inputs A and B by 1 to produce the diminished-1 value A_ and

B_, and modulo 2n sum of A_ and B_ can be performed by any

diminished-1 architecture, as follows:

||A+B|2n+1|2

n = |A/+B/|2n

+ cout………… (3)

Fig. 1 (a) Architecture of the weighed modulo 2n + 1 adder proposed

in [10], (b) Architecture of FA1, FA0, and FA+, respectively.

The value cout is the inverted end-around carry produced by A/

+ B/, and the architecture of [11] is shown in Fig. 2.

Fig. 2 Architecture proposed in [11].

The architecture proposed in [10] makes use of a

constant time operator, which is composed of a simplified

carry-save adder stage, leading to efficient modulo 2n+1 ad-

der. The architecture proposed in [11] can be applied in the

design of area-efficient residue generators and multi-operand

modulo adders. However, it should be noted that, in [10], the

values that are subtracted by the inputs A and B are not con-

stants. In [11], the way to implement the translator for de-

creasing the sum of two inputs by 1 was not mentioned. Fur-

thermore, in [11], the ranges of two inputs A and B are less

than the one proposed in [10] (i.e., {0,2n−1} versus {0,2n}). To

remedy these problems, we will propose our area-efficient

weighted modulo 2n+1 adder design in the next section.

III. AREA & DELAY-EFFICIENT WEIGHTED MODULO

2n+ 1 ADDER

Instead of subtracting the sum of A and B by D,

which is not a constant as proposed in [10], we use the con-

stant value − (2n+1) to be added by the sum of A and B. In

addition, we make the two inputs A and B to be in the range

{0,2n}, which is 1 more than {0,2n−1} as proposed in [11]. In

the following, we present the designs of our proposed

weighted modulo n+1 adder.

From [8]given two (n+1)-bit inputs A = anan−1, . . . , a0 and B

= bnbn−1, . . . , b0, where 0 ≤ A,B ≤ 2n. The weighted modulo

2n+1 of A+B can be represented as follows:

|A+B|2n+1

= A+B−(2n+1), if (A+B)>2n

=A+B, otherwise. ……………. ……..(4)

Equation (4) can be stated as

||A+B|2n+1|2

n

=|A+B−(2n+1)|2n,if(A+B)>2n

=|A+B−(2n+1)|2n+|(2n+1)|2

n,otherwise……(5)

This can then be expressed as

||A+B|2n+1|2

n

= |A+B− (2n+1) |2n, if (A+B) >2n

=|A+B− (2n+1) |2n+1, otherwise……...(6)

From (6), it can easily be seen that the value of the weighted

modulo 2n+1 addition can be obtained by first subtracting the

value of the sum of A and B by (2n+1) (i.e.,0111.......1) and

then using the diminished-1 adder to get the final modulo sum

by making the inverted end-around carry as the carry-in. Now,

we present the method of weighted modulo 2n+1 addition of

A and B as follows. Denoting Y/ and U/ as the carry and sum

vectors of the summation of A,B and −(2n+1), where Y/ = y/n−2

Design of Weighted Modulo 2n + 1 Adder Using Diminished-1 adder with the correction circuits




y/n−3…… y/

0 (~y/n−1)and U/ = u/

n-1 u/n-2 u

/n-3 ...... u

/0 , the modulo

addition can be expressed as follows:

For i=0 to n−2, the val-

ues of y/i and u/

i can be expressed as y/i =ai∨bi and u/

i = ~

(ai∨bi), respectively (∨ is denoted as logic OR operation).

Since the bit widths of Y/ and U/ are only n bits, the values of

y/n-1 and u/

n−1 are required to be computed taking the values of

an, bn, an−1, and bn−1 into consideration (i.e., y/n-1 and u/

n−1 are

the values of the carry and the sum produced by

2an+2bn+an−1+bn−1+ 1, respectively). It should be noted that 0

≤ A,B ≤ 2n, which means an=an−1=1 or bn=bn−1=1 will cause the

value of A or B to exceed the range of {0, 2n}. Thus, these in-

put combinations (i.e., an=an−1=1 or bn=bn−1= 1) are not al-

lowed and can be viewed as don’t care conditions, which can

help us simplify the circuits for generating y/n-1 and u/

n−1. That

is, the maximum value of 2an + 2bn + an−1 + bn−1 + 1 is 5,

which occurs at an = bn = 1 (i.e., the maximum value of y/n-1 is

2).

The truth table for generating y/n-1, u/

n−1 and FIX is

given in Table I, where × is denoted as don’t care. The reason

for FIX is that, under some conditions, y/n−1 = 2 (e.g., an=bn=1

and an−1=b n−1=0), which cannot be represented by 1-bit line

marked as “*” in Table I); therefore, the value of y/n-1 is set to

1, and the remaining value of carry (i.e., 1) is set to FIX. No-

tice that FIX is wired-OR with the carry-out of Y/+ U/ (i.e.,

cout) to be the inverted end around carry (denoted by cout ∨

FIX) as the array-in for the diminished-1 addition stage later

on. When y/n-1=2, FIX=1; otherwise, FIX=0. According to

Table I, we can have y/n-1= (an∨ bn ∨ an−1∨ bn−1), u

/n−1 = ~(an−1∨

bn−1), and FIX = anbn∨ bnan−1∨ anbn−1, respectively. Based on

the aforementioned, our proposed weighted modulo 2n + 1

addition of A and B is equivalent to

||A + B|2n+1| 2

n = |A + B − (2n + 1)| 2n

=|Y/ + U/|2n + ~ (cout ∨ FIX)……….. (7)

Two examples for our proposed addition methods are given as

follows.

Example 1: Suppose n = 8, A = 10010 = 0011001002, and B

=15010 = 0100101102, respectively.

Step 1: (A + B) − (2n + 1) => Y/ = 011101102, U/ = 000011012,

FIX = 0.

Step 2: Y/+ U/ = 011100112, cout = 0, => Y/+ U/+ =

111110 102 = |100 + 150|257

= 25010.

TABLE I: TRUTH TABLE FOR GENERATING y/n-1, u

/n-1 AND

FIX (X: CONDITIONS WHEN y/n-1 = 2)

Example 2: Suppose n = 8, A = 20010 = 0110010002, and B =

15010 = 0100101102, respectively.

Step 1: (A + B) − (2n + 1) => Y/= 010111102, U/ = 101000012,

FIX = 1.

Step 2: Y/ +U/ = 111111112, cout = 0, => Y/ + U/=

001011 1012 = |200

+150|17 = 9310.

The architecture for our proposed adder is given in

Fig. 3.From Fig. 3, the signal of FIX can be computed in par-

allel with the translation to Y/+ U/, leading to efficient correc-

tion. In addition, the hardware cost for our correction scheme

and FAF are less than the one proposed in [10], due to the fact

that there are two inconstant numbers that should be processed

[i.e., FA1 and FA0, as shown in Fig. 1(b)] in the translation

stage.

Fig. 3 Architecture of our proposed weighted modulo 2n + 1 adder

with the correction scheme.

IV. SYNTHESIS RESULTS AND COMPARISONS

We use Verilog structuring hardware description

language to design our proposed adders and the existing work

in [10] to compare delay/area performance. We adopt Sklan-

sky-style [5], [12] and Brent–Kungstyle [13] parallel-prefix

structures for the diminished-1 adder implementations.

n

nBA2

)12(

2

0

122n

i

n

ii

i ba

nnnnn baba211 11.....0111)22(

2

0

1212n

i

n

ii

i ba

nnnnn baba211 )122(

2

0

1// 222n

i

n

ii

i uy

nnnnn baba211 )122(

an bn an-1 bn-1 u/n−1 y/

n-1 FIX

0 0 0 0 1 0 0

0 0 0 1 0 1 0

0 0 1 0 0 1 0

0 0 1 1 1 1 0

0 1 0 0 1 1 0

0 1 0 1 X X X

0 1 1 0 0 1* 1

0 1 1 1 X X X

1 0 0 0 1 1 0

1 0 0 1 0 1* 1

1 0 1 0 X X X

1 0 1 1 X X X

1 1 0 0 1 1* 1

1 1 0 1 X X X

1 1 1 0 X X X

1 1 1 1 X X X





Sklansky-style parallel-prefix structure with correc-

tion circuits for our proposed weighted modulo 28 +1 adder is

shown in Fig. 4 The square and diamond (♦) nodes denote the

pre- and post processing stages of the operands, respectively.

The black nodes () evaluate the prefix operator, and the white

nodes (◦) pass the unchanged signals to the next prefix level.

Fig. 4. Diminished-1 adder based on the Sklansky-style parallel-

prefix structure with the correction circuits for our proposed weighted

modulo 28 +1 adder.

It should be noted that the value of sn (i.e., cout) can

be computed by wiring-AND all propagate signals (i.e., sn

=Σi=0n-1 pi. All the implementations for [10] and in this brief

are synthesized in 0.13-μm TSMC CMOS technology, and the

results are given in Table II. Since in [11], the authors did not

mention the way to implement the translator for producing A/

and B/, we use the similar carry-save architecture used in [10]

and in this brief for adding -bit A, B and the value −1 (i.e., n-

bit 1’s) to produce the carry and sum vectors for the dimin-

ished-1 adder.

The area cost in is less than the one in [10] and in

this brief as the range for inputs represented in is within {0, 2n

− 1}, and not within {0, 2n} for the inputs in [10] and in this

brief. This leads to hardware savings in the translation stage.

We also compare the area cost using the Brent–Kung-style

parallel-prefix-based diminished-1 adder for implementation,

and the results are given in Table III & V. It can be observed

that the area cost for our proposed adders is less than [10]

under the same delay. This is due to the constant value − (2n +

1) that we have used, instead of the inconstant value used for

translating the inputs A and B in [10].

TABLE II: Area Synthesis results for various modulo (2n+1) adders;

SKLANSKY-STYLE PARALLEL-PREFIX STRUCTURE-BASED

DIMINISHED-1 ADDER

TABLE III: Area Synthesis results for various modulo (2n+1) adders;

BRENT–KUNG-STYLE PARALLEL-PREFIX STRUCTURE-

BASED DIMINISHED-1 ADDER

TABLE IV: Delay synthesis results for various modulo (2n + 1) ad-

ders; (UNITS IN NANO SECONDS ) SKLANSKY-STYLE PAR-

ALLEL-PREFIX STRUCTURE-BASED DIMINISHED-1 ADDER

TABLE V: Delay synthesis results for various modulo (2n + 1) ad-

ders; BRENT–KUNG-STYLE PARALLEL-PREFIX STRUCTURE-

BASED DIMINISHED-1 ADDER

V. CONCLUSION

In this brief, an improved area-efficient weighted

modulo 2n + 1 adder has been proposed. This has been

achieved by modifying the existing diminished-1 modulo ad-

ders to incorporate simple correction schemes. The proposed

adders can perform weighted modulo 2n + 1 addition and pro-

duce sums that are within the range {0, 2n}. Synthesis results

show that our proposed adders can outperform previously

reported weighted modulo adder in terms of area and delay

constraints.

ACKNOWLEDGMENT

The authors would like to express their cordial thanks to

their esteemed patrons Mr.Sharad Kulkarni, Head of Electron-

ics and Communication Department, SKTRMCE, M. Nagar,

and also to our beloved Asst professors, P. Naresh Kumar

and D. Maruthi Kumar.

REFERENCES [1] M. A. Soderstrand, W. K. Jenkins, G. A. Jullien, and F. J. Taylor,

Residue Number System Arithmetic: Modern Applications in

Digital Signal Processing. New York: IEEE Press, 1986.

[2] J. Ramírez, A. García, S. López-Buedo, and A. Lloris, “RNS-

enabled digital signal processor design,” Electron. Lett. vol. 38,

no. 6, pp. 266– 268, Mar. 2002.

[3] G. C. Cardarilli, A. Nannarelli, and M. Re, “Reducing power

dissipation in FIR filters using the residue number system,” in

Proc. IEEE 43rd IEEE MWSCAS, Aug. 2000, pp. 320–323.

[4] A. Nannarelli, M. Re, and G. C. Cardarilli, “Tradeoffs between

residue number system and traditional FIR filters,” in Proc. IEEE

ISCAS, May 2001, pp. 305–308.

[5] R. Zimmermann, “Efficient VLSI implementation of modulo 2n ±

1 addition and multiplication,” in Proc. 14th IEEE Symp. Com-

put. Arithmetic, Apr. 1999, pp. 158–167.

[6] H. T. Vergos, C. Efstathiou, and D. Nikolos, Diminished-one

modulo 2n + 1 adder design,” IEEE Trans. Comput., vol. 51, no.

12, pp. 1389– 1399, Dec. 2002.

[7] C. Efstathiou, H. T. Vergos, and D. Nikolos, “Modulo 2n ± 1

adder design using select-prefix blocks,” IEEE Trans. Comput.,

vol. 52, no. 11, pp. 1399–1406, Nov. 2003.

[8] B. Cao, C. H. Chang, and T. Srikanthan, “An efficient reverse

converter for the 4-moduli set {2n − 1, 2n, 2n + 1, 22n + 1} based

on the new Chinese remainder theorem,” IEEE Trans. Circuits

Syst. I, Fundam. Theory Appl., vol. 50, no. 10, pp. 1296–1303,

Oct. 2003.

n=8 Existing SK Implemented SK

Area (no. of slices) 29 24

Gate count 336 273

Levels of logic 16 12

No. of input LUT’s 54 43

n=8 Existing BK Implemented BK

Area (no. of slices) 30 25

Gate count 336 270

Levels of logic 13 12

No. of input LUT’s 54 44

n=8 Existing SK Implemented SK

Logical delay 12.00(49.9%) 10.475(53.6%)

Routing delay 12.025(50.1%) 9.065(46.4%)

Total Delay 24.025 19.540

n=8 Existing BK Implemented BK

Logical delay 10.728(51.4%) 10.475(53%)

Routing delay 10.124(48.6%) 9.303(47%)

Total Delay 20.852 19.778





[9] T.-B. Juang, M.-Y. Tsai and C.-C. Chiu, “Corrections on ‘VLSI

design of diminished-one modulo 2n + 1 adder using circular

carry selection’,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol.

56, no. 3, pp. 260–261, Mar. 2009.

[10] H. T. Vergos and C. Efstathiou, “A unifying approach for

weighted and diminished-1 modulo 2n+ 1 addition,” IEEE

Trans. Circuits Syst. II, Exp. Briefs, vol. 55, no. 10, pp. 1041–

1045, Oct. 2008.

[11] H. T. Vergos and D. Bakalis, “On the use of diminished-1 ad-

ders for weighted modulo 2n + 1 arithmetic components,” in

Proc. 11th EUROMICRO Conf. Digit. Syst. Des. Archit., Meth-

ods Tools, Sep. 008, pp. 752–759.





design of weighted modulo 2n + 1 adder using diminished-1 adder with the correction circuits

Documents

proposed adders

addition of y

weighted representation

adder design

correct carry

residue number system

area cost

used moduli