final thesis - diva portalliu.diva-portal.org/smash/get/diva2:743281/fulltext01.pdffinal thesis low...

Final thesis

Low Power Design Using RNS

by

Viktor Classon

LITH-ISY-EX--14/4792--SE

2014-08-25

Final thesis


by

Viktor Classon

LITH-ISY-EX--14/4792--SE

2014-08-25

Supervisor, ISY: Oscar Gustafsson

Supervisor, Ericsson: Shafqat Ullah

Examiner: Mark Vesterbacka

Abstract

Power dissipation has become one of the major limiting factors in the de-sign of digital ASICs. Low power dissipation will increase the mobility of theASIC by reducing the system cost, size and weight. DSP blocks are a majorsource of power dissipation in modern ASICs. The residue number system(RNS) has, for a long time, been proposed as an alternative to the regulartwo’s complement number system (TCS) in DSP applications to reduce thepower dissipation. The basic concept of RNS is to first encode the inputdata into several smaller independent residues. The computational opera-tions are then performed in parallel and the results are eventually decodedback to the original number system. Due to the inherent parallelism of theresidue arithmetics, hardware implementation results in multiple smaller de-sign units. Therefore an RNS design requires low leakage power cells andwill result in a lower switching activity.

The residue number system has been analyzed by first investigating dif-ferent implementations of RNS adders and multipliers (which are the basicarithmetic functions in a DSP system) and then deriving an optimal com-bination of these. The optimum combinations have been used to implementan FIR filter in RNS that has been compared with a TCS FIR filter.

By providing different input data and coefficients to both the RNS andTCS FIR filter an evaluation of their respective performance in terms ofarea, power and operating frequency have been performed. The result ispromising for uniform distributed random input data with approximately15 % reduction of average power with RNS compared to TCS. For a realisticDSP application with normally distributed input data, the power reductionis negligible for practical purposes.

iii

Acknowledgements

First of all I would like to thank the employees at the section Digital ASICat Ericsson in Kista for all the help and support and most of all for giving methe opportunity of doing my master’s thesis. A big thanks especially to mysupervisor Shafqat Ullah at Ericsson for all the support and for sharing hisknowledge with me. I also want to thank my examiner, Mark Vesterbacka,and my supervisor at ISY, Oscar Gustafsson, for all the help and supportduring the thesis.

Most of all I want to thank my parents, Maria and Svante, and mypartner, Linda, for supporting me during my five years of studies. Withoutyou I would not have been able to make it!

Finally I would like to thank all fellow students at Linkoping Universityand master thesis students at Digital ASIC for a fantastic time and sup-porting company! Especially a big thanks to Emil Lundqvist for his reviewof this work as an opponent and to Ejaz Sadiq for being a great soundingboard during my master’s thesis.

Stockholm, August 2014Viktor Classon

v

Contents

1 Introduction 11.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Prior work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 52.1 RNS arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Basic arithmetic operations . . . . . . . . . . . . . . . 62.1.2 Conversion . . . . . . . . . . . . . . . . . . . . . . . . 72.1.3 Choosing a moduli-set . . . . . . . . . . . . . . . . . . 7

2.2 FIR filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.2 Profile developing . . . . . . . . . . . . . . . . . . . . 92.3.3 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Proposed design 113.1 Arithmetic functions . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2.1 Forward conversion . . . . . . . . . . . . . . . . . . . . 133.2.2 Reverse conversion . . . . . . . . . . . . . . . . . . . . 14

3.3 Choosing a moduli-set . . . . . . . . . . . . . . . . . . . . . . 143.3.1 Modulus for comparison . . . . . . . . . . . . . . . . . 14

4 Implementation 164.1 RNS addition . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.1 LUT and binary adders . . . . . . . . . . . . . . . . . 164.1.3 End-around carry parallel-prefix adder . . . . . . . . . 174.1.4 Parallel-prefix adder using the diminished-one number

representation for modulo 2n − 1 . . . . . . . . . . . . 19

vi

CONTENTS CONTENTS

4.1.5 Addition using Verilog’s built-in modulo operator . . . 204.1.6 Ordinary addition for modulo 2n . . . . . . . . . . . . 21

4.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.0 LUT based multiplication . . . . . . . . . . . . . . . . 214.2.1 Modulo-m product-partitioning multiplier with ROM 224.2.2 Parallel-prefix multiplier for modulo 2n − 1 . . . . . . 224.2.3 Parallel-prefix multiplier for modulo 2n + 1 . . . . . . 224.2.4 Modular multiplication using the isomorphic technique 244.2.5 High radix modulo 2n − 1 multiplier . . . . . . . . . . 254.2.6 Using Verilog’s built-in operators . . . . . . . . . . . . 274.2.7 Ordinary multiplication for modulo 2n . . . . . . . . . 27

4.3 Forward conversion . . . . . . . . . . . . . . . . . . . . . . . . 274.3.1 RNS adder tree . . . . . . . . . . . . . . . . . . . . . . 284.3.2 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . 294.3.3 Forward conversion for modulo 2n − 1 . . . . . . . . . 304.3.4 Using Verilog’s built-in modulo operator . . . . . . . . 304.3.5 Forward conversion for modulo 2n . . . . . . . . . . . 30

4.4 Reverse conversion . . . . . . . . . . . . . . . . . . . . . . . . 304.4.1 CRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.5 Choosing a moduli set . . . . . . . . . . . . . . . . . . . . . . 334.6 FIR filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Results 365.1 Input data and coefficients . . . . . . . . . . . . . . . . . . . . 36

5.1.1 Uniformly distributed data and coefficients . . . . . . 365.1.2 Sawtooth data and coefficients ramp . . . . . . . . . . 375.1.3 Realistic input data and FIR coefficients . . . . . . . . 375.1.4 Different properties of the data and coefficients . . . . 37

5.2 Adders and multipliers . . . . . . . . . . . . . . . . . . . . . . 405.2.1 Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2.2 Multipliers . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3 Moduli-set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.4 FIR filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.4.1 Varying input word length . . . . . . . . . . . . . . . . 565.4.2 Varying number of taps . . . . . . . . . . . . . . . . . 585.4.3 Folded FIR filter . . . . . . . . . . . . . . . . . . . . . 60

5.5 Maximum frequency . . . . . . . . . . . . . . . . . . . . . . . 61

6 Discussion and conclusions 626.1 Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2 Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.3 FIR filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7 Future Work 65

vii

CONTENTS CONTENTS

Appendix 68

A Modulus 69

B Optimum moduli-sets 70

C RNS adders results 72

D RNS multiplier results 82

viii

List of Figures

1.1 The basic principle of RNS . . . . . . . . . . . . . . . . . . . 2

2.1 Flowchart for profile development . . . . . . . . . . . . . . . . 9

4.1 RNS addition using two binary adders . . . . . . . . . . . . . 174.2 Hybrid version of RNS addition . . . . . . . . . . . . . . . . . 174.3 Logic operators for the parallel-prefix adder . . . . . . . . . . 184.4 End-around carry parallel-prefix adder with Sklansky parallel-

prefix structure . . . . . . . . . . . . . . . . . . . . . . . . . . 194.5 Adder based on the diminished-one number representation . . 204.6 Adder using Verilog’s built-in operator . . . . . . . . . . . . . 214.7 Modulo 2n addition using a binary adder . . . . . . . . . . . 214.8 Modulo-m product-partitioning multiplier with ROM . . . . . 234.9 Multiplier based on parallel-prefix RNS adders . . . . . . . . 244.10 Multiplier based on the isomorphic technique . . . . . . . . . 254.11 Modular high-radix RNS multiplier . . . . . . . . . . . . . . . 264.12 Multiplier based on the isomorphic technique . . . . . . . . . 274.13 Multiplier based on binary multiplier for modulo 2n . . . . . 274.15 Forward conversion using an RNS adder tree . . . . . . . . . 284.14 Forward conversion with registers at input and output . . . . 284.16 Reverse conversion . . . . . . . . . . . . . . . . . . . . . . . . 314.17 Reverse conversion using CRT . . . . . . . . . . . . . . . . . . 324.18 Modulo-m product-partitioning multiplier with combinato-

rial logic instead of LUT. Changes from ordinary RNS mul-tiplier are shown in white. . . . . . . . . . . . . . . . . . . . . 33

4.19 Direct-form FIR filter . . . . . . . . . . . . . . . . . . . . . . 354.20 Transposed direct-form FIR filter . . . . . . . . . . . . . . . . 354.21 Folded FIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1 Discrete uniform distributions for different number of bits . . 375.2 Sawtooth data and ramp coefficients . . . . . . . . . . . . . . 385.3 Histogram for realistic input data for a 20-bit FIR filter . . . 385.4 Frequency response for some different FIR filter coefficients . 395.5 Description of the RNS multiplier and adder graphs . . . . . 41

ix

LIST OF FIGURES LIST OF FIGURES

5.6 Test setup for RNS adders and multipliers . . . . . . . . . . . 415.7 Total power dissipation for all RNS adders using uniformly

distributed input data as described in section 5.1.1 on page 36. 425.8 The best RNS adder for each modulo compared with RNS

adders for modulo 2n. Power dissipation was calculated usinguniformly distributed input data as described in section 5.1.1on page 36. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.9 RNS adders type 0 and 1 . . . . . . . . . . . . . . . . . . . . 455.10 RNS adders type 2 and 3 . . . . . . . . . . . . . . . . . . . . 465.11 RNS adders type 4 and 5 . . . . . . . . . . . . . . . . . . . . 475.12 RNS adders type 6 . . . . . . . . . . . . . . . . . . . . . . . . 485.13 All RNS multipliers . . . . . . . . . . . . . . . . . . . . . . . 495.14 RNS multipliers type 0 and 1 . . . . . . . . . . . . . . . . . . 505.15 RNS multipliers type 2 and 4 . . . . . . . . . . . . . . . . . . 515.16 RNS multipliers type 5 and 6 . . . . . . . . . . . . . . . . . . 525.17 RNS multipliers type 7 and TCS multiplier . . . . . . . . . . 535.18 Combinations of RNS multipliers with a maximum of 3,5,7,9

and 11 RNS multipliers in the moduli-set compared with TCSmultiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.19 Combinations of RNS adders with a maximum of 3,5,7,9 and11 RNS adders in the moduli-set compared with RNS adderfor modulo 2n (which is almost identical to a TCS adder) . . 56

5.20 64-tap FIR filter with varying input bit width for RNS andTCS. Uniform data as described in section 5.1.1 on page 36.The red line represents the power reduction. . . . . . . . . . . 57

5.21 16-tap FIR filter with varying input word length for RNSand TCS. Sawtooth data with ramp coefficients as describedin section 5.1.2 on page 37. The red line represents the powerreduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.22 20-bit FIR filter with varying number of taps for RNS andTCS. Uniformly distributed data and coefficients are used asdescribed in section 5.1.1 on page 36. The red line representsthe power reduction. . . . . . . . . . . . . . . . . . . . . . . . 59

5.23 20-bit FIR filter with varying number of taps for RNS andTCS. Realistic data with constant FIR coefficients are used asdescribed in section 5.1.3 on page 37. The red line representsthe power reduction. . . . . . . . . . . . . . . . . . . . . . . . 60

x

List of Tables

2.1 Example of signed and unsigned representations using themoduli-set {m1,m2} = {2, 3}, M = 6. . . . . . . . . . . . . . 6

3.1 Definition of different adder types . . . . . . . . . . . . . . . . 123.2 Definition of different RNS multiplier types . . . . . . . . . . 133.3 Definition of different RNS forward conversion types . . . . . 143.4 Definition of different RNS reverse conversion types . . . . . . 14

4.1 Periodicity of some residues . . . . . . . . . . . . . . . . . . . 29

5.1 Sign switching rate of input data . . . . . . . . . . . . . . . . 405.2 Theoretical toggle rate at output of a 20-bit input multiplica-

tion. The optimum moduli-sets as presented in table 5.5 onpage 55 is used. . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3 The best adder type for chosen modulo with respect to power,refer to table 3.1 on page 12 for details about the adder types. 44

5.4 The best multiplier type for chosen modulo with respect topower, refer to table 3.2 on page 13 for details about themultiplier types. . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.5 Some of the optimum moduli-sets and their resulting numberof bits. For the complete list refer to Appendix B. . . . . . . 55

5.6 Results for an FIR filter folded 22 times with 20-bit inputand 22 taps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.7 Synthesis results for 4-tap FIR filter with 20 or 30 input bit-width. The synthesis maximum frequency goal was set to 1.5GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

B.1 Resulting moduli-sets . . . . . . . . . . . . . . . . . . . . . . 70

C.1 Results for RNS adders . . . . . . . . . . . . . . . . . . . . . 73

D.1 Results for RNS multipliers . . . . . . . . . . . . . . . . . . . 83

xi

Nomenclature

ASIC Application-specific integrated circuit

CRT Chinese remainder theorem

DSP Digital signal processing

FIR Finite impulse response

LUT Look-up table

RNS Residue number system

ROM Read only memory

RTL Register-transfer level

TCS Two’s complement number system

VHDL Very high speed integrated circuit hardware description language

VLSI Very large scale integration

xii

Chapter 1

Introduction

Power dissipation has become one of the major limiting factors in the de-sign of digital ASICs. Low power dissipation will increase the mobility of theASIC by reducing the system cost, size and weight. DSP blocks are a majorsource of power dissipation in modern ASICs. The residue number system(RNS) has, for a long time, been proposed as an alternative to the regulartwo’s complement number system (TCS) in DSP applications to reduce thepower dissipation. Some research have shown that implementing FIR filtersin residue number system (RNS) instead of two’s complement number sys-tem (TCS) can give a reduction in power dissipation. FIR filters are amongthe less complex DSP blocks. A general sketch of how RNS computationscan be performed is shown in figure 1.1 on the next page. The earliest usageof the residue number system can be found in The Mathematical Classicof Sun Tzu by the Chinese mathematician Sun Tzu who lived in the 3rdcentury AD. A famous riddle from his book [1] is quoted below.

Now there are an unknown number of things.If we count by threes, there is a remainder of 2.If we count by fives there is a remainder 3.If we count by sevens, there is a remainder 2.Find the number of things.

(Sun Tzu)

1

1.1. PROBLEM STATEMENT CHAPTER 1. INTRODUCTION

Forward

conversion

Reverse

conversion

Modulo m1

Modulo m2

Modulo mn

Operands Results

Modulo channels

Figure 1.1: The basic principle of RNS

1.1 Problem statement

The problem to be investigated in this thesis is to compare RNS with TCS.This will be done by implementing FIR filters in RNS and TCS and comparethese two implementations. The requirements of the implementation in RNSis to minimize the power while still being able to run the circuit at 500MHz and not getting a massive increase in area. Both the RNS and TCSimplementation shall be able to receive and process one sample per clockcycle. Another design goal is to be able to process an input of around 20bits. An important idea in the thesis is that RNS in the future could beimplemented in large parts of the ASIC and therefore the forward and reverseconversion will not contribute as much as the computational operation topower dissipation and area, therefore will the implementation and resultsfocus on an implementation without the conversion. The ASIC will beintended for and implemented in 32 nm technology. The aim of the thesiscan be summarized with answering these four questions:

• Is RNS better than TCS with respect to power, area and timing?

• How can RNS be implemented and what different design choices canbe made?

• What further extensions of RNS exists that can further improve itsproperties?

• How can RNS be introduced into Ericsson’s systems?

2

1.2. METHODOLOGY CHAPTER 1. INTRODUCTION

1.2 Methodology

The thesis work has been performed at Ericsson in Kista, Stockholm. Thework has been executed in the following way:

1. Literature study

2. Implementation of adders and multipliers in RNS

3. Comparison of individual RNS adders and multipliers

4. Study of what RNS adders and multipliers to use and what combina-tions of them that will result in the lowest power dissipation

5. Implementation of RNS and TCS FIR filter

6. Comparison of RNS and TCS FIR filters

7. Implementation of forward and reverse conversion

8. Comparison of different forward and reverse conversion techniques

9. Analysis of the different results

1.3 Prior work

The arithmetic of a residue number system and its application to digitalsignal processing and computer technology has earlier been described in[2], [3] and [4]. The use of RNS for reduction of power in FIR filters hasearlier been discussed in for example [5], [6] and [7] with good results. Twopromising results can be seen in figure 5 from [7] and figure 6 from [5] wherethe power is significantly lower with RNS compared to TCS.

In figure 5 from [7] we can see an RNS FIR filter with forward and reverseconversion with 16-bit coefficients and a 32-bit dynamic range comparedwith a TCS FIR filter designed with the same restrictions.

In figure 6 from [5] the dynamic and static power dissipation of an RNSFIR filter compared with a TCS FIR filter. Both the RNS and TCS havea 10-bit input and coefficients and a dynamic range of 20 bits. Note thatneither in figure 5 from [7] nor in figure 6 from [5] the authors take accountof the increasing bit width in the accumulator due to the number of taps.

1.4 Outline

A brief introduction to the thesis is given in chapter 1. In chapter 2 the basicmathematical principles of RNS are presented. From these basic mathemat-ical principles a set of different implementations of RNS is presented, anda subset of these are the proposed design presented in chapter 3. The de-tailed implementation is presented in chapter 4 and the simulation results

3

1.5. LIMITATIONS CHAPTER 1. INTRODUCTION

of the implementation are presented in chapter 5. The results are discussedin chapter 6. From the results and the discussion some conclusions can bemade, which are presented in chapter 7 together with suggestions of futurework areas in the subject.

1.5 Limitations

The aim with the thesis is to investigate RNS, hence individual TCS addersand multipliers will not be implemented (here the synthesis tool will decidewhich adders and multipliers to use). The focus of the thesis has been onRNS specific algorithms and not low power algorithms that are suitablefor TCS or FIR filters in general. When implementing the individual RNSadders and multipliers the focus has been on the structure and not the exactimplementation of the ordinary binary adders and binary multipliers usedin the implementation, again this has been left to the synthesis tool in mostcases. The major limitation of the thesis work is that it has a time budgetof 20 weeks.

4

Chapter 2

Background

The basic concept of a residue number system (RNS) is to represent a largenumber with a set of smaller integers. In RNS some computations canbe performed more efficiently. RNS originates from the Chinese remaindertheorem (CRT) of modular arithmetic, which was first described by theChinese third-century mathematician Sun Tzu [4]. The CRT can be usedto solve his famous riddle on page 1.

2.1 RNS arithmetic

RNS arithmetic is based on the mathematical congruence relation. Let aand b be integers. These integers are said to be congruent modulo m if a− bis exactly divisible by m. This is often in mathematical contexts written asa ≡ b (modm). The number m is called a modulus or base.

Now let q be the quotient and r be the remainder from the divisionof the integer a by the modulus m, a = q · m + r. From the congruencedefinition above we then have a ≡ r (mod m). The integer r is the residueof a with respect to m, which will be denoted as r = |a|m. We shall assumethat r ∈ {0, 1, 2, ...,m− 1}, that is r lies in the set of least positive residuesmodulo m.

Now define a moduli-set as {m1,m2, ...,mN} that contains N positiveand pairwise relatively prime moduli. That is for every i and j where i 6= j,the moduli mi and mj in the moduli-set have no common divisor larger thanunity. Now M can be defined as the dynamic range of the RNS moduli-set. M can be computed as the product of the moduli-set according toequation (2.1).

M =

N∏n=1

mn. (2.1)

For every moduli-set a number X < M has a unique representation consist-ing of the N residues. This representation can be calculated as {xi = |X|mi :

5

2.1. RNS ARITHMETIC CHAPTER 2. BACKGROUND

1 ≤ i ≤ N}. We shall represent such a representation as 〈x1, x2, ..., xN 〉.

Example 1 Take the moduli-set {3, 5, 7}, then m1 = 3, m2 = 5 and m3 =7. The dynamic range of the moduli-set will be

M =

3∏n=1

mn = m1 ·m2 ·m3 = 3 · 5 · 7 = 105

Now let X = 10. Then 〈x1, x2, x3〉 can be calculated as follows

x1 = |X|m1 = |10|3 = |3 · 3 + 1|3 = |3 · 3|3 + |1|3 = 1

x2 = |X|m2 = |10|5 = |2 · 5|5 = 0

x3 = |X|m3 = |10|7 = |1 · 7 + 3|7 = |1 · 7|7 + |3|7 = 3.

So X = 10 can be represented as 〈1, 0, 3〉 in the RNS moduli-set {3, 5, 7}.

A residue number system can be used to represent both signed and un-signed numbers. For unsigned numbers, RNS can represent numbers in therange of 0 ≤ X ≤ M − 1. For signed numbers RNS can represent numbersthat satisfies one of the following relations:

−M − 1

2≤ X ≤ M − 1

2if M is odd

−M2≤ X ≤ M

2− 1 if M is even.

See table 2.1 for an example of RNS representation for signed and unsignednumbers.

〈x1, x2〉 Unsigned Signed〈0, 0〉 0 0〈1, 1〉 1 1〈0, 2〉 2 2〈1, 0〉 3 −3〈0, 1〉 4 −2〈1, 2〉 5 −1

Table 2.1: Example of signed and unsigned representations using themoduli-set {m1,m2} = {2, 3}, M = 6.

2.1.1 Basic arithmetic operations

Addition, subtraction and multiplication are quite straightforward calcu-lated in RNS. Division, sign-determination, overflow-detection and magnitude-comparison are significantly harder to implement. As for addition, subtrac-tion and multiplication the only difference with ordinary TCS operations is

6

2.1. RNS ARITHMETIC CHAPTER 2. BACKGROUND

that the result has to be in the range of [0 : m − 1]. Addition X + Y = Zcan be calculated as

X + Y = 〈x1, x2, ..., xn〉+ 〈y1, y2, ..., yn〉 = 〈z1, z2, ..., zn〉 = Z

where zi = |xi + yi|mi.

Multiplication X · Y = Z can be calculated in a similar fashion

X · Y = 〈x1, x2, ..., xn〉 · 〈y1, y2, ..., yn〉 = 〈z1, z2, ..., zn〉 = Z

where zi = |xi · yi|mi .

Note that the difference between addition and multiplication is for additionxi + yi ≤ 2(mi − 1) and for multiplication xi · yi ≤ (mi − 1)2 which leadsto that the reduction required to get a result in the range of [0 : m − 1]can be much greater for multiplication. This fact will cause a more compleximplementation of RNS multipliers compared to RNS adders.

2.1.2 Conversion

The goal with the forward and reverse conversion is to convert a numberrepresented in TCS into RNS, and RNS into TCS.

Forward conversion

Conversion from TCS to RNS can in a straightforward way be computedusing division, where the remainder of the division will be the residue.

Reverse conversion

Reverse conversion is described from an implementation perspective in chap-ter 4 on page 16.

2.1.3 Choosing a moduli-set

There exist in general two types of modulus, arbitrary and special. Thespecial modulus are usually referred to as the ones that is used in a specialmoduli-set, {2n−1, 2n, 2n+1}, or extensions of this. The arbitrary modulusare the remaining integers, including the primes.

In this thesis it will be assumed that the arbitrary sets consists only ofprimes due to the fact that completely arbitrary modulus are not guaranteedto be relative primes. The special sets are designed to be more hardwareefficient and are only guaranteed to be relative primes. Using only primemodulus is probably the best moduli-set from a purely mathematical view[8]. But the special sets might have other advantages. This gives us thatthe desired modulo for comparison would be the primes and those fulfillingthe requirements of a special set.

7

2.2. FIR FILTERS CHAPTER 2. BACKGROUND

Special moduli-sets

The most common special moduli-set is {2n − 1, 2n, 2n + 1} and extensionsof this [4]. The use of this moduli-set is often motivated by less complicatedimplementation of RNS to TCS converters and the fact that dedicated hard-ware multipliers can be used on FPGA platforms [9]. A common extensionis to add 2n±q ± 1, where q ≥ 1 to the moduli-set.

2.2 FIR filters

Finite-duration Impulse Response, FIR, filters is probably the most com-monly used digital filter. An FIR filter is based on the mathematical conceptof discrete convolution where the filtered output of a signal can be calculatedusing equation (2.2) [10].

y[n] =

N∑i=0

h[i]x[n− i]. (2.2)

In equation (2.2) y[i] is the output, x[i] is the input and h[i] are the coeffi-cients. N is defined as the order of the filter and the filter will have N + 1taps.

2.3 Design flow

Each implementation will be performed in the way presented below. If anerror would occur at any step the process was restarted from 1.

1. Implement

2. Simulate and verify with TCS result

3. Synthesize

4. Do analysis of synthesis and develop a profile in terms of area, powerand delay

8

2.3. DESIGN FLOW CHAPTER 2. BACKGROUND

RTLDesign

Synthesis

Netlist

Verilog Simulation

SwitchingActivity File

Cell Library

Power Calculation

Power Reports

Synthesis Reports

Input Data

Figure 2.1: Flowchart for profile development

2.3.1 Synthesis

The RTL code has been synthesized using Synopsys Design Compilerr.During synthesis (for all designs, both RNS and TCS) some optimizationswill be done by the synthesis tool. The synthesis tool will try to minimizethe power while still fulfilling the required critical path. [11]

2.3.2 Profile developing

The design flow for developing a profile in terms of area, delay and poweris shown in figure 2.1. The source of the area, delay, power and otherinteresting parameters are presented below:

Synthesis Reports Area (cell library specific), gate count, UVT cell ratio(see section 2.3.3 on the following page), etc.

Power Reports Power dissipation (leakage power, switching power andinternal power), delay, critical path, etc.

9

2.3. DESIGN FLOW CHAPTER 2. BACKGROUND

2.3.3 Power

The power calculations are made in the Power Calculation block in figure 2.1on the previous page. The power dissipation can be divided into dynamicand static power dissipation. Dynamic power dissipation consists of switch-ing and internal power dissipation. The power dissipation reports that aregenerated from PrimeTime1 are described in [13]. Note that both static anddynamic power in the equations below scale with the size of the design aswell.

• Static power

– Leakage power Pl = V · Ileak

• Dynamic power

– Switching power Ps = 12 · Cload · V 2 · f

– Internal power Pint = ( 12 · Cint · V 2 · f) + (V · Ishortcut)

Different standard cells

Depending on what standard cell the synthesis tool chooses the leakagepower consumption will be different. A bigger VT will result in smallerleakage. The synthesis tool can choose between the following standard celltypes (sorted in decreasing VT ):

UVT Ultra-high VT

SVT Super-high VT

MVT Mezzanine VT

HVT High VT

1The Synopsys PrimeTime suite provides a single, golden, trusted signoff solution fortiming, signal integrity, power, timing constraint and variation-aware analysis. - [12]

10

Chapter 3

Proposed design

3.1 Arithmetic functions

The basic arithmetic functions of an FIR filter is addition and multiplication.These operations can be implemented in many different ways in the residuenumber system. The basic complication with RNS is to deal with modulooverflow that occurs when the result is bigger than the modulo. For amodulo, mi the result of the operations always has to be within the range{0, ...,mi−1}. For addition the result will be in the range of {0, ..., 2(mi−1)}and therefore at most one subtraction with mi has to be performed to be inthe correct range. For multiplication on the other hand the product will bein the range of {0, ..., (mi − 1)2} which complicates the reduction.

To find out which algorithms for addition and multiplication that are thebest in terms of power dissipation, simulations will be made on individualadders and multipliers for all chosen modulus.

3.1.1 Addition

Three basic approaches for designing addition of arbitrary modulo is pre-sented in [14]. These three are: using LUT, using two ordinary binary addersand a hybrid between these two. Each one of these three will be optimal interms of area and timing for certain modulus [14].

An interesting approach of implementing addition in the special moduloset {2n − 1, 2n, 2n + 1} by using a parallel-prefix adder is presented in [4].A more in detail description is available in [15]. Due to the low level of thisapproach [16] can be used as an initial implementation idea.

The Verilog language and the synthesis tool has support for the built-inVerilog operators “+”, addition, and “%”, modulus. An implementationwith only these operators will be a good naive reference when comparingwith the other implementations. Also for modulo 2n the trivial implemen-tation using a standard adder will be used. The additions to implement are

11

3.2. CONVERSION CHAPTER 3. PROPOSED DESIGN

Type Description

0 Look-up table (LUT) based RNS adder

1 Two binary adders

2 A hybrid between 0 and 1

3 Modulo 2n − 1 using modified parallel-prefix adder

4 Modulo 2n + 1 using diminished-one numberrepresentation

5 Using Verilog’s built in operators “+”and “%”

6 Ordinary adder for modulo 2n

Table 3.1: Definition of different adder types

summarized in table 3.1.

3.1.2 Multiplication

RNS multiplications can be implemented in a huge variety of ways. Apromising implementation is presented in [17] which is a modulo-m product-partitioning multiplier with ROM. This implementation seems more promis-ing than multiplication by reciprocal of modulus as described in [18] sincethis implementation uses three instead of two multipliers.

For the special set {2n − 1, 2n, 2n + 1} some improvements in terms ofarea, power and delay can be made. A parallel modulo-m multiplier for2n ± 1 is presented in [4] without any special speed-up techniques. Thisimplementation might be interesting especially for relatively small n.

A implementation for 2n±1 is presented in [19] using Booth-8 encoding.This approach is compared with other implementations with a good resultfor n ≥ 32 though this can be extrapolated to give a good result at lowern as well. If this is not the case a Booth-4 encoding could be used. TheBooth encoding technique is well known in other contexts than RNS andwill therefore not be investigated further in this thesis.

Another interesting approach is [20]. In [5] an isomorphic techniqueis used to replace multiplication with addition and look-up table. Thisimplementation would be very interesting.

The different RNS multipliers that have been selected for implementationare presented in table 3.2 on the next page.

3.2 Conversion

As with the RNS adders and multipliers several forward and reverse conver-sion algorithms should be investigated.

12

3.2. CONVERSION CHAPTER 3. PROPOSED DESIGN

Type Description

0 Look-up table (LUT) based RNS multiplier

1 modulo-m product-partitioning multiplier withROM

2 Parallel modulo-m multiplier for 2n − 1

3 Parallel modulo-m multiplier for 2n + 1

4 Isomorphism technique as described in [21]

5 High radix multiplier for modulo 2n − 1 [20]

6 Using Verilog’s built in operators “+”and “%”

7 Ordinary multiplication for modulo 2n

Table 3.2: Definition of different RNS multiplier types

3.2.1 Forward conversion

Forward conversion is generally far less complicated to implement than re-verse conversion. Even though residue number systems needs too be ableto represent a certain bit width, the input is mostly represented with amuch smaller bit width. This reduces of course the complexity. The gen-eral way of solving the forward conversion problem involves the fact thata TCS number can be calculated in the following well known manner:−an−12n−1 +

∑i=n−2i=0 ai2

i. The most straightforward solution is to cal-culate the sum of the ai2

i’s using RNS adders instead of TCS adders. Byslightly modifying the solution on page 64 in [4] it can support negativenumbers as well.

A modification of this algorithm is to use the periodic properties of mod-ulus. The periodic properties can be derived by calculating the residue ofeach 2i mod m.

A look-up table based solution is also possible though since it would haveto consist of all possible input combinations at ninput bits corresponding toRNS values of nrns ≥ ninput bits. Due to this fact, this solution can beexcluded from further investigation.

In [22] a modular exponentiation algorithm is proposed that seems promis-ing. Unfortunately it is very complex and therefore very difficult to imple-ment in a parametrized way for arbitrary modulo and input bit width.

Several other sequential algorithms have been proposed in [4] but thesewill not produce one result per clock-cycle and are therefore not investigatedfurther.

13

3.3. CHOOSING A MODULI-SET CHAPTER 3. PROPOSED DESIGN

Type Description

0 RNS adder tree

1 RNS adder tree with periodicity

2 Forward conversion for the special moduli-set

3 Using SystemVerilog’s built-in operators

Table 3.3: Definition of different RNS forward conversion types

3.2.2 Reverse conversion

Reverse conversion is the conversion process from RNS to TCS. The mainmethods for implementing the reverse conversion is by using either the Chi-nese Remainder Theorem (CRT) or the Mixed-Radix Conversion (MRC)technique. All other techniques are variants of these two [4]. Among theseCRT is the most straightforward solution. MRC utilize ”mixed-radix” tech-niques and this would require far more investigating. Other implementa-tions involve using pseudo-SRT division (simply modification of a divisionalgorithm so that it only produce the remainder) or the core function (asdescribed in [4]). An other interesting implementation would be using aLUT. Unfortunately the resulting LUT would be larger than what a synthe-sis tool would support. The resulting reverse converter to be implementedis presented in table 3.4.

Type Description

1 Using CRT

Table 3.4: Definition of different RNS reverse conversion types

3.3 Choosing a moduli-set

Previous research [8], [5], [7] have shown that a significant amount of thepower dissipation still will take place during the regular computations andnot in the forward or the reverse conversion when the number of taps inan FIR filter is big. Therefore the initial guess of which moduli-sets tochoose was done by comparing the power dissipation of a simple one tap FIRfilter element without conversion. These simple components where designedin various ways and then an optimal (or near optimal) combination wascalculated. There are basically two groups of moduli-sets: arbitrary andspecial sets as described in section 2.1.3 on page 7.

3.3.1 Modulus for comparison

Since the basic idea with RNS is to choose several small numbers to representa big number it will be advantageous too choose these numbers quite small

14

3.3. CHOOSING A MODULI-SET CHAPTER 3. PROPOSED DESIGN

(but not necessarily as small as possible). A requirement is that the RNSFIR filter will be able to compute inputs that are 20 bits wide, and due tothe multiplication the incoming word-length has to be extended to 40 bits.Therefore the modulus for the comparison as described above will be chosenas follows.

• All primes between 2 and 251

• All numbers fulfilling 2n or 2n ± 1 where n ≤ 14 (to get a dynamicrange of 240 with the moduli-set {2n − 1, 2n, 2n + 1})

• Each closest prime that is smaller than 2n where n ≤ 14. If these turnout to be optimal, possibly more similar primes will be added.

Note that these sets intersects and no modulo shall be tested twice. Theserules will result in the set of integers presented in Appendix A.

15

Chapter 4

Implementation

The main implementation philosophy has been to use parametrized mod-ules and functions. The implementation has been done on RTL level inSystemVerilog [23], therefore it can not be guaranteed (and most unlikely)that the synthesis tool maps (as described in section 2.3.1 on page 9) theRTL code directyle to the hardware structure described by the RTL code.Though it has of course been verified that the functionality is consistent.

The parametrization of the RTL code will make the implementation ofRNS easily adaptable for new DSP algorithms, scalable in terms of number oftaps and bit-widths and easily modifiable for new algorithms of for exampleadders and multipliers.

4.1 RNS addition

The main issue with RNS addition is that the sum has to be within therange of [0,mi − 1]. The corresponding binary adder would result in asum of [0, 2(mi − 1)] and therefore at most a modulo reduction with mi isrequired.

4.1.1 LUT and binary adders

The most direct approach to implement RNS addition is to use a look-uptable (LUT), two binary adders, or a combination of these.

LUT

The LUT RNS adder implementation is a straightforward ROM storing eachsum of the two inputs.

16

4.1. RNS ADDITION CHAPTER 4. IMPLEMENTATION

Two binary adders

By using one binary adder for addition and the other adder for subtractionin the modulo reduction and modulo overflow detection a quite neat RNSadder as shown in figure 4.1 was implemented.

+

a b

1 0

sum

+

−m

Figure 4.1: RNS addition using two binary adders

Hybrid

The hybrid RNS adder consists of one adder connected to a LUT. The LUTstores the resulting residue for each sum of the adder.

+

LUT

a b

sum

Figure 4.2: Hybrid version of RNS addition

4.1.3 End-around carry parallel-prefix adder

The end-around carry parallel-prefix adder is designed to only work formodulo 2n−1, where the advantage is that by using the end-around carry, ituse approximately the same hardware as an ordinary parallel-prefix adder.

17


The parallel-prefix adder was implemented by translating the RNS adderin [16] from VHDL to SystemVerilog. It uses a Sklansky parallel-prefixstructure with an end-around carry. The adder uses different logic operatorsas shown in figure 4.3. The exact behavior of the logic operators is describedin equation (4.1).

(Gl−1i:k , P

l−1i:k )

(Gli:k, P

li:k) (Gl

i:k, Pli:k)

(Gl−1i:j+1, P

l−1i:j+1)(Gl−1

j:k , Pl−1j:k )

(Gli:k, P

li:k) (Gl

i:k, Pli:k)

ai bi

(gi, pi)

si

pi ci

Figure 4.3: Logic operators for the parallel-prefix adder

: Gli:k = Gl−1

i:k P li:k = P l−1

i:k

: Gli:k = Gl−1

i:j+1 ∨ (Gl−1i:k ∧ P

l−1i:j+1) P l

i:k = P l−1i:k ∧ P

l−1i:j+1

: gi =

{a0 ∧ b0 ∨ a0 ∧ c0 ∨ b0 ∧ c0 if i = 0ai ∧ bi otherwise

pi = ai ⊕ bi

: ci+1 = Gmi:0

si = pi ⊕ ci (4.1)

In equation (4.1) i is the bit position and i = 0, ..., nbits − 1, l is the level inthe prefix structure and l = 1, ...,m where m is the total required depthof the prefix structure (which can be calculated by dlog2(nbits)e). And0 ≤ k ≤ j ≤ i (for more details see [15]). An 8-bit example of the parallel-prefix adder can be seen in figure 4.4 on the next page.

18


0

1

2

3

a0 b0a1 b1a2 b2a3 b3a4 b4a5 b5a6 b6a7 b7

s0s1s2s3s4s5s6s7

Sklansky prefix structure

Figure 4.4: End-around carry parallel-prefix adder with Sklansky parallel-prefix structure

4.1.4 Parallel-prefix adder using the diminished-one num-ber representation for modulo 2n − 1

By using the fact that modulo 2n − 1 almost can be represented with nbits a diminished-one number representation can be implemented. In adiminished-one representation n bits represent the number and the n+1 bitis used to identify a zero. Hence an ordinary number X can be representedas X in the diminished-one representation, as presented in equation (4.2).

X = 0 : X[n] = 1

X 6= 0 : X[n] = 0, X[n− 1 : 0] = X − 1. (4.2)

The advantage with this adder is that the parallel-prefix structure used inthe modulo 2n − 1 adder in section 4.1.3 can be used except for the smallchange that the end-around carry is inverted. Some forward and reverseconversion is also needed which is described in figure 4.5 on the next page.The blocks used in figure 4.5 on the following page are the same as usedin the adder for modulo 2n − 1 and are described in equation (4.1) on thepreceding page.

19


0

1

2

3

a0 b0a1 b1a2 b2a3 b3a4 b4a5 b5a6 b6a7 b7

s0s1s2s3s4s5s6s7

Sklansky prefix structure

+

a −1

+

b −1

MSB

+

1

1 0

sum

1 0

0

Forwardconversion

Reverseconversion

Figure 4.5: Adder based on the diminished-one number representation

4.1.5 Addition using Verilog’s built-in modulo operator

Addition using Verilog’s built-in modulo operator can be performed by usingthe %-sign and then letting the synthesis tool decide what to do with it. The

20

4.2. MULTIPLICATION CHAPTER 4. IMPLEMENTATION

implementation will look as figure 4.6 and can be expressed as

assign output sum = ( input a + input b ) % modulo parameter ;

a

sum

%

b

Figure 4.6: Adder using Verilog’s built-in operator

4.1.6 Ordinary addition for modulo 2n

The easiest and most efficient implementation of an RNS addition will bethat one for modulo 2n as it will only require an ordinary binary adderwhere the resulting carry-out is neglected. The implementation will look asfigure 4.7 or:

assign output sum = input a + input b ;

a

sum

b

Figure 4.7: Modulo 2n addition using a binary adder

4.2 Multiplication

RNS multiplication has the same requirement as the RNS adders in that theproduct has to be within the range [0,mi−1], but unfortunately the productof an ordinary binary multiplier will be within the range [0, (mi − 1)2] sothe number of modulo reductions with mi would instead be mi− 2 (insteadof one in the RNS adder) which would increase the complexity dramaticallywith increasing modulo.

4.2.0 LUT based multiplication

The look-up table based RNS multiplication is using the two operands asaddresses to a two dimensional look-up table where the product is stored.

21


4.2.1 Modulo-m product-partitioning multiplier with ROM

A modulo-m product-partitioning multiplier with ROM is presented in [17]and [4] for arbitrary modulus. This multiplier is based on the fact that the

product P∆= AB can be expressed as in equation (4.3). AB is partitioned

into four parts: W , k + 1 bits; Z, n − (k + 1) bits; Y , 1 bit and X, n − 1bits.

P = AB = 22n−(k+1)W + 2nZ + 2n−1Y +X. (4.3)

|AB|m =∣∣∣22n−(k+1)W + 2nZ + 2n−1Y +X

∣∣∣m

=∣∣∣∣∣∣22n−(k+1)W + 2n−1Y

∣∣∣m

+ |2nZ|m +X∣∣∣m

=/

2n = m+ c⇒ |2n|m = c/

=∣∣∣∣∣∣22n−(k+1)W + 2n−1Y

∣∣∣m

+ cZ +X∣∣∣m. (4.4)

Here n is the number of bits, c = 2n −m and k = 1 + blog2 cc and m is themodulo. By ensuring that the product is within the range of the moduli,equation (4.3) can be rewritten as equation (4.4).

Since k will be relatively small and Y only consist of one bit, e∆=∣∣22n−(k+1)W + 2n−1Y

∣∣m

can be pre-calculated for each value of W and Yand stored in a ROM. Due to the number of bits used to store e, cZ and Xthe result will be in the range 0 ≤ e + cZ + X < 2m. It is slightly betterto instead store e − m in the ROM [4] and detect whether the result gotnegative or not and in that case add m. The resulting RNS multiplier canbe seen in figure 4.8 on the next page.

4.2.2 Parallel-prefix multiplier for modulo 2n − 1

By reusing the parallel-prefix adder from section 4.1.3 and connecting partialproducts to it an implementation of a parallel-prefix multiplier for modulo2n − 1 can be achieved. The entire multiplication can be rewritten as equa-tion (4.5). Note that due to the properties of RNS, PPi will always be nbits wide.

|X · Y |2n−1 =

n−1∑i=0

PPi where PPi = xi ∧ yn−i−1...y0yn−1...yn−i. (4.5)

In figure 4.9 on page 24 the schematic of the parallel-prefix multiplier formodulo 2n − 1 is shown. Note that more optimal adder tree structuresprobably can be used.

4.2.3 Parallel-prefix multiplier for modulo 2n + 1

The parallel-prefix multiplier for modulo 2n + 1 may be implemented usingdiminished-one representation to remove the extra bit required compared to

22


ROM

A B

Multiplier

n n

k + 1 1n− 1− k

Multiplier

Adder

n− 1

c

k

WZ

Y

X

cZe

0 1

Adder

−m

n+ 1

n+ 1

MSB

|AB|m

n

Figure 4.8: Modulo-m product-partitioning multiplier with ROM

23


RNS Adder

PP0 PP1

RNS Adder

PP2

RNS Adder

PPn−1

product

Figure 4.9: Multiplier based on parallel-prefix RNS adders

modulo 2n. This implementation would require a diminished-one adder butdue to the poor results of this adder (as can be seen in figure 5.7 on page 42)this multiplier has not been implemented.

4.2.4 Modular multiplication using the isomorphic tech-nique

This technique has earlier been used by [5] and [8]. The basic principleof the isomorphic technique is described in [21] and can be summarized asin equation (4.6). When m is a prime there exists a q that will fulfill theequation. This means that a multiplicand ni can instead be represented bywi.

ni = |qwi |m with ni ∈ [1,m− 1], wi ∈ [0,m− 2] (4.6)

For the specific case of a two input modular multiplier, i ∈ [1, 2] we getequation (4.7).

|a1·a2|m = |qw|m where w = |w1+w2|m−1 and a1 = |qw1 |m, a2 = |qw2 |m.(4.7)

A direct implementation of equation (4.6) and equation (4.7) can be imple-mented using two different look-up tables, each storing m−1 entries, and anRNS modulo-m adder. A sketch of this implementation can be seen in fig-ure 4.10 on the next page. Due to the fact that zero can not be representedby ni = |qwi |m, this has to be taken care of. This is done by a simple zerodetecting logic. A schematic of the multiplier can be found in figure 4.10 onthe facing page.

24


+

LUT LUT

n1 n2

LUT

1 0

’0

Figure 4.10: Multiplier based on the isomorphic technique

4.2.5 High radix modulo 2n − 1 multiplier

The high radix modular RNS multiplier for modulo 2n − 1 is based on asuggested multiplier in [4]. In [20] another multiplier is suggested that willonly work for modulus where n−1

k = 4 where k is an integer and n− 1 is thenumber of bits required to represent a number in modulo 2n − 1.

This multiplier is based on the fact that a multiplication A · B can berewritten as a sum partial products. First divide A and B into two k-

bit numbers where k = b dlog2(2n−1)e+12 c so that A = A12k + A0 and B =

B12k + B0. Now the product A · B can now be rewritten as equation (4.8)by using cyclic convolution.

A ·B = (A12k +A0) · (B12k +B0) = 2k(A1B0 +A0B1) + (A1B1 +A0B0)

= 2kP1 + P0. (4.8)

This can be extended to |A · B|2n−1 = |2kP1 + P0|2n−1 for modulo 2n − 1.P0 and P1 can also be expressed as

P0 =a2 − b2 + c2 − d2

8

P1 =a2 − b2 − c2 + d2

8.

25


where

a = A0 +A1 +B0 +B1

c = A0 +A1 −B0 −B1

d = A0 −A1 +B0 −B1

b = A0 −A1 −B0 +B1.

By combining these equations a schematic can be derived as shown in fig-ure 4.11.

A0 A1 B0 B1

+ +

+ +

−1

LUT LUTSquaring LUT

+

−1

+

A0 A1 B0 B1

+ +

+ +

−1

LUT LUT

+

−1

+

−1−1

−1

a b c d

>> 3 >> 3

Mod 2n − 1 adder

product

P0 P1

Figure 4.11: Modular high-radix RNS multiplier

26

4.3. FORWARD CONVERSION CHAPTER 4. IMPLEMENTATION

4.2.6 Using Verilog’s built-in operators

Multiplication using Verilog’s built-in modulo operator can be performed byusing the %-sign and then letting the synthesis tool decide what to do withit. The implementation will look as figure 4.12 and can be expressed as

assign output product = ( input a ∗ input b ) % modulo parameter ;

a

product

%

b

Figure 4.12: Multiplier based on the isomorphic technique

4.2.7 Ordinary multiplication for modulo 2n

The easiest and most efficient implementation of an RNS multiplication willbe the one for modulo 2n as it will only require an ordinary binary multiplierwhere the resulting most-significant half of the product is neglected. Theimplementation will look as figure 4.13 and can be expressed as

assign output product = input a ∗ input b ;

a

product

b

Figure 4.13: Multiplier based on binary multiplier for modulo 2n

4.3 Forward conversion

Forward conversion is the translation process from TCS to RNS. Since theTCS bit-width most often is smallest at the input, the complexity of theforward conversion will be less than the complexity of the reverse conversion.Due to this smaller bit-width no pipelining1 is required to fulfill the timing

1Pipelining is a process where registers are inserted in the critical path to increase themaximum operating frequency

27


RNS +

0RNS(20)

20

0RNS(21)

21

0RNS(2n−2)

2n−2

0RNS(2n−1)

2n−1

RNS +

RNS +

l = 1

l = 2

l = nlevels

Figure 4.15: Forward conversion using an RNS adder tree

goal of 500 MHz and therefore only registers at the input and output ofthe forward conversion are considered in the design, which can be seen infigure 4.14.

Forward

conversionTCS RNS

Figure 4.14: Forward conversion with registers at input and output

4.3.1 RNS adder tree

The most straightforward solution for the forward conversion is to use thefact that a TCS number can be represented as −an−12n−1 +

∑i=n−2i=0 ai2

i.The RNS representation of a number in TCS can be derived by first con-verting each operand in the summation to RNS (using a LUT) and thencalculating each individual addition with RNS adders. This will result in anRNS adder tree.

The RNS adder tree has parametrized input bit-width and modulo. Theentire tree will scale with this parameter as seen in figure 4.15. The num-ber of levels, nlevels, in the RNS adder tree can be calculated as nlevels =dlog2(n)e where n is the number of TCS input bits. At each level therewill be win

l = d n2l e input wires, where l is the level. There will also be

woutl =

⌈win

l

2

⌉which will result in nadders =

⌊win

l

2

⌋adders.

28


4.3.2 Periodicity

The periodicity of a modulo can be derived from the fact that result from2i mod m will repeat itself for all modulo when i increases (note that thisrepetition not necessarily is valid for residues where i < dlog2(m)e). Theperiodicity of the modulus can be solved by a brute-force search and storingthe periodicity of the relevant modulus in a ROM. An example of this canbe seen in table 4.1.

Modulus m Residue 2n mod m Periodicity, p3 1,2,1,2,1,2,... 24 1,2,0,0,0,... 15 1,2,4,3,1,2,3,1,... 46 1,2,4,2,4,2,4,... 27 1,2,4,1,2,4,... 311 1,2,4,8,5,10,9,7,3,6,1,2,4,8,5,10,9,7,3,6,1,... 1017 1,2,4,8,16,15,13,9,1,2,4,8,16,15,13,9,1,... 831 1,2,4,8,16,1,2,4,8,16,1,2,4,8,16,1,... 551 1,2,4,8,16,32,13,26,1,2,4,8,16,32,13,26,1,2,... 8

Table 4.1: Periodicity of some residues

The periodic property of a modulo can be used to reduce the TCS bit-width used for forward conversion. A TCS number can be sign extendedinto p · dlog2(nTCSbits)e bits and than partitioned into chunks that are p bitswide. These chunks is then added with regular TCS adders. The sum ofthe addition is then used in the forward conversion, which will reduce thenumber of bits used in the RNS forward conversion. A conversion processidentical to the one presented in section 4.3.1 on the preceding page canfollow the periodicity simplification.

Example 2 Consider the forward conversion of the 13-bit TCS representa-tion of the number −32493 for modulus 5. −3821 is expressed as 1000100010011in TCS.The periodicity of modulus 5 is 4 which can be fetched from table 4.1. Sosign-extend the TCS number to p · dlog2(nTCSbits)e = 4 · dlog2(13)e = 16bits: 1111000100010011. Separate this number into 4-bit chunks and addthem (remember the negative weight of the MSB):

1111 + 0001 + 0001 + 0011 = −1 + 1 + 1 + 3 = 4 = 000100

Then use an RNS adder tree to compute the RNS representation of thenumber:

| − 3821|5 = |0 + 0 + 0 + |22|5 + 0 + 0|5 = 4

29

4.4. REVERSE CONVERSION CHAPTER 4. IMPLEMENTATION

4.3.3 Forward conversion for modulo 2n − 1

By extending the solution for forward conversion in the special moduli-setin [4] a more general forward conversion solution for a bigger moduli-setcontaining modulo 2n − 1 can be achieved.

The first step is to sign extend the TCS input to a number of bits,nS.e.−bits, that is even divisible by n. This new number is then dividedinto nS.e.−bits

n -chunks which are summed using an RNS adder tree. A smallmodification will be necessary to allow the modulo-2n − 1 RNS adder tosupport input in the range of [0,mi] instead of [0,mi − 1].

4.3.4 Using Verilog’s built-in modulo operator

For comparison a TCS to RNS forward converter also has been implementedusing the built-in Verilog modulo operator, %, as seen below in the Verilogcode.

assign output rns = i n p u t t c s % modulo parameter ;

4.3.5 Forward conversion for modulo 2n

Forward conversion modulo 2n is easily performed by selecting the n leastsignificant bits. An example of how this could be realized in Verilog is shownbelow.

assign output rns = i n p u t t c s [ n b i t s −1 : 0 ] ;

4.4 Reverse conversion

Reverse conversion is the translation process from RNS to TCS. Observethat compared with forward conversion, no computations can be performedindividually in each modulus but the entire moduli-set has to be taken ac-count of. This complication makes reverse conversion a major, if not THEmajor, drawback of RNS. Due to this non-parallel approach as can be seenin other parts of RNS, the reverse conversion process is also more complex.The two main reverse conversion algorithms are based on the Mixed-RadixConversion (MRC) or the Chinese Remainder Theorem (CRT), where onlythe latter has been investigated in this thesis. The reverse conversion pro-cess can be seen in figure 4.16 on the facing page, note that in comparisonto forward conversion the reverse conversion has to be pipelined to fulfill thetiming goals.

30


Reverse

conversionRNS TCS

Figure 4.16: Reverse conversion

4.4.1 CRT

The Chinese Remainder Theorem (recall the quote in chapter 1 on page 1)is a mathematical way of finding the TCS representation of an RNS number.Recall from chapter 2 on page 5 that a moduli-set, m1,m2, ...,mN , consistingof N pairwise relative primes can represent a number X within the rangeof k ≤ X ≤ k + M where M =

∏Ni=1mi and k is an integer. A number in

RNS can be represented as 〈x1, x2, ..., xN 〉 where each xi = |X|mi.

Now define Mi = Mmi

and M−1i as the multiplicative inverse where

||M−1i |mi

Mi|mi= 1. Now the CRT states that a TCS number X can be

computed by equation (4.9).

X =

∣∣∣∣∣N∑i=1

xi|M−1i |mi

Mi

∣∣∣∣∣M

. (4.9)

The Mi can easily computed as described above, though the multiplica-tive inverse, M−1

i , is far harder to calculate. There is in fact no generalexpression to calculate the multiplicative inverse in this context, [4]. Forprime modulus Fermat’s Theorem may sometimes be useful for finding themultiplicative inverse. A far less complicated solution of finding the multi-plicative inverse is to instead calculate |M−1

i |miwith a brute-force search

for all numbers between 0 and mi − 1. This can be computed on a PC andat elaboration time the multiplicative inverses to the chosen modulus in themoduli-set can be stored in a memory element on the ASIC. Note that Mi

and M−1i will be unique for each moduli-set. Pseudo code for finding Mi

and |M−1i |mi

is presented below:

for modulus in modul i s e t :M i = prod ( modu l i s e t )/ modulusfor i n v i t e r in range (1 , modulus ) :

i f ( ( M i ∗ i n v i t e r ) % modulus == 1 ) :M i inve r s e = i n v i t e rbreak

print modulus , M i , M i inve r s e

The product of Mi and |M−1i |mi

will be stored in a look-up table in theASIC. Beside the LUT the reverse conversion is just a matter of multiplyingeach xi with the content of the LUT and then add the products. Boththe multiplication and addition will be performed using RNS adders. Theresulting schematic will look like figure 4.17 on the following page.

31


TCS

RNS

RNS ADD

x1

LUT

x2

LUT

xN

LUT

...

RNS ADD

RNS ADD

RNS ADD

RNS MULT RNS MULT RNS MULT

Figure 4.17: Reverse conversion using CRT

In figure 4.17 the RNS adder tree has been pipelined due to the hugeamount of bits needed to represent the dynamic range M . This is enough tofulfill the timing requirements in the RNS adder tree when using the simplestand straightforward adder type with two binary adders connected (addertype 1, figure 4.1 on page 17). After a quick review of the implementedmultipliers it can be discovered that none of them are purely combinatorialfor arbitrary modulo. Due to the need of multiplication of big bit widths inCRT reverse conversion multiplier type 1 (Modulo-m product-partitioningmultiplier with LUT) was reimplemented with combinatorial logic insteadof LUT and some registers where inserted to pipeline the multiplier, as canbe seen in figure 4.18 on the next page.

32

4.5. CHOOSING A MODULI SET CHAPTER 4. IMPLEMENTATION

COMB

A B

Multiplier

Multiplier

Adder

c

0 1

Adder

−m

MSB

|AB|m

Figure 4.18: Modulo-m product-partitioning multiplier with combinatoriallogic instead of LUT. Changes from ordinary RNS multiplier are shown inwhite.

By adding registers in the RNS adder tree, designing the RNS multiplierpurely combinatorial and adding registers inside the multiplier a maximumoperating frequency of 500 MHz could be achieved, which was required.

4.5 Choosing a moduli set

The optimum moduli set in terms of power dissipation for representing nbits can be found by solving equation (4.10) on the following page. Here piis the power dissipation, mi is the modulo, N is the number of modulus andsi is a decision variable. When optimizing the moduli-sets only the power

33

4.6. FIR FILTER CHAPTER 4. IMPLEMENTATION

dissipation of the computational operations as been taken account of (oneadder and one multiplier), in a larger system the conversion is considered tobe neglected.

minsi∈{0,1}

(i=N−1∑i=0

sipi

)when

∏si 6=0

simi ≥ 2n

and |mi|mj6= 0 ∀ ( i 6= j,mi ≥ mj) (4.10)

This can be solved with the following pseudo code.

for n comb in range (1 , max n comb ) :for modul i s e t in combinat ions ( a l l modulus , n comb ) :

c u r r e n t c o s t = sum( power cost [ i ] for i in modul i s e t )i f c u r r e n t c o s t < b e s t c o s t :

i f prod ( modu l i s e t ) >= dynamic range :for pa i r in combinat ions ( modul i set , 2 ) :

r e l a t i v e p r i m e = True# gcd = g r e a t e s t common d i v i d e ri f gcd ( pa i r [ 0 ] , pa i r [ 1 ] ) != 1 :

r e l a t i v e p r i m e = Falsei f r e l a t i v e p r i m e :

b e s t c o s t = c u r r e n t c o s tb e s t m o d u l i s e t = modu l i s e t

Due to the exponentially increasing number of combinations with the num-ber of modulus, the modulus sent to the program has been optimized andthose modulus with a very high pi

log2(mi)has been excluded without any

affect on the outcome.

4.6 FIR filter

Several different implementations are possible to achieve identically func-tionality to the direct-form implementation of equation (2.2) on page 8 asshown in figure 4.19 on the facing page. A major improvement of this de-sign can be achieved by moving the registers from before the multiplicationsto inside the summation chain as shown in figure 4.20 on the next page.This design is usually referred to as transposed direct-form FIR filters andwill have a larger area than the direct-form FIR filter (due to more thantwice the size of the registers) but the critical path will only go through onemultiplication and one addition (compared to the entire summation chainand one multiplication in the previous case). The simulations preformedin this thesis will be using the transposed direct-form FIR filter as shownin figure 4.20 on the facing page unless anything else is stated. The wordlength used in the accumulator registers in this case will be equation (4.11)on the next page to prevent overflow.

34

4.6. FIR FILTER CHAPTER 4. IMPLEMENTATION

wacc = wdata + wcoef + dlog2(ntaps)e (4.11)

c3 c2 c1 c0

Figure 4.19: Direct-form FIR filter

c0 c1 c2 c3

Figure 4.20: Transposed direct-form FIR filter

In a larger DSP system the samples are very unlikely to arrive at everyclock cycle, therefore the hardware can be reused by using a folded FIRfilter as presented in figure 4.21. Several other techniques for designing thestructure of FIR filters are available but not further discussed in this thesis.

LUT

in

out

Figure 4.21: Folded FIR

There are several different ways of deriving the FIR coefficients to fulfillcertain goals for the filter. The method used in this thesis is based onthe program described in [24] from [25] that is implemented in the Signalprocessing library in SciPy, an open-source library of scientific tools forPython. The choice of coefficients is not very import for the results of thisthesis as long as they are realistic.

35

Chapter 5

Results

The results has been achieved using conditions that are very close to a realDSP system.

• A 500 MHz clock has been used

• New data has been assumed to arrived every clock cycle

• The libraries used have been using a 32 nm technology

• The power dissipation was calculated as average power dissipation andnot peak power dissipation

5.1 Input data and coefficients

The dynamic power dissipation will highly depend on what input data andcoefficients that are provided to the system. The input data and coefficientswill also affect RNS and TCS systems in different ways. For example willsigned and unsigned values result in a similar behavior in RNS but signedvalues will most likely give quite a higher dynamic power dissipation forTCS, compared with unsigned values.

5.1.1 Uniformly distributed data and coefficients

In some cases random data that is uniformly distributed has been used. Thedata and coefficients have been generated by randomly generating each bit.The distributions can be seen in figure 5.1 on the next page for differentnumber of bits. In this case both the data and coefficients are updated eachclock cycle, the reason for also updating the coefficients is to not let thechoice of coefficients affect the result. The updating of the coefficients willaffect the result but the result are assumed to be affected in the same forTCS and RNS.

36

5.1. INPUT DATA AND COEFFICIENTS CHAPTER 5. RESULTS

-1.0 -0.5 0.0 0.5 1.0

1019

0

1

2

3

4

5

6

710−20 64 bits

-3 -2 -1 0 1 2 3

109

0.0

0.5

1.0

1.5

2.0

2.5

3.010−10 32 bits

-150 -100 -50 0 50 100 1500.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0.0030

0.0035

0.0040

0.00458 bits

Figure 5.1: Discrete uniform distributions for different number of bits

5.1.2 Sawtooth data and coefficients ramp

Another interesting input data to investigate is sawtooth data. The idea is togenerate the highest possible switching activity. This can then be matchedwith for example a ramp as coefficients. The input data and coefficientsused in this case are presented in figure 5.2 on the following page. They aregenerated using equation (5.1), where i is the current clock-cycle.

data = i ∗ (−1)i

coef = i (5.1)

5.1.3 Realistic input data and FIR coefficients

The most realistic data is normal distributed data with constant FIR co-efficients. The FIR coefficients has been generated using Remez exchangealgorithm as presented in [26]. The FIR filter will have a passband be-tween 0 and 2π · 0.297 rad/sample and a stopband between 2π · 0.328 andπ rad/sample. The resulting frequency response for some different numbersof taps can be seen in figure 5.4 on page 39. The input data is normaldistributed and consists of a signal that has already been processed by alow-pass filter. These are typical signal properties for an input data signalto an FIR filter in a DSP application. The histogram of the input data isplotted in figure 5.3 on the next page.

5.1.4 Different properties of the data and coefficients

Due to different properties of the data and coefficients they will behave indifferent ways in RNS and TCS.

Sign switching rate The sign switching rate is the rate with which thedata switches from positive to negative or vice versa. The switchingrates for the used input data and an ordinary normal distributioncontaining white noise is presented in table 5.1 on page 40.

37


0 5 10 15 20 25 30 35 40-40

-30

-20

-10

0

10

20

30

40Data

0 5 10 15 20 25 30 35 40-40

-30

-20

-10

0

10

20

30

40Coefficients

Sawtooth data and ramp coefficients

Figure 5.2: Sawtooth data and ramp coefficients

0 211−211 212−212 213−213 214−214

Value

0.00000

0.00005

0.00010

0.00015

0.00020

Pro

babi

lity

ofo

ccur

ance

Realistic input data for 20 bit FIR filter

Figure 5.3: Histogram for realistic input data for a 20-bit FIR filter

38


0.0 0.5 1.0 1.5 2.0 2.5 3.0

Frequency (rad/sample)

10−6

10−5

10−4

10−3

10−2

10−1

100

Am

plit

ude

(dB

)

3 taps

47 taps

95 taps

151 taps

-160

-140

-120

-100

-80

-60

-40

-20

0

Ang

le(r

adia

ns)

3 taps

47 taps

95 taps

151 taps

FIR filter frequency response

Figure 5.4: Frequency response for some different FIR filter coefficients

39

5.2. ADDERS AND MULTIPLIERS CHAPTER 5. RESULTS

Theoretical multiplier toggle rate The theoretical multiplier toggle rateis the rate with which the product of a multiplication in TCS and RNStoggles for different input data and coefficients. The results are shownin table 5.2. The fact that a 60-bit RNS multiplier is compared as wellin table 5.2 is due to the fact that for example an FIR filter wouldrequire a larger word length than two times the input word lengthwhen using more than one tap as seen in equation (4.11) on page 35.

Input data Sign switching rateUniform distribution 0.5Normal distribution 0.5Sawtooth data 1.0Realistic data 0.33

Table 5.1: Sign switching rate of input data

Input data 40-bit TCS 40-bit RNS 60-bit RNSUniform distribution 0.486 0.425 0.456Normal distribution 0.484 0.427 0.458Sawtooth data 0.810 0.414 0.451Realistic data 0.414 0.429 0.458

Table 5.2: Theoretical toggle rate at output of a 20-bit input multipli-cation. The optimum moduli-sets as presented in table 5.5 on page 55 isused.

5.2 Adders and multipliers

The results for different modulus for each adder and multiplier are presentedin this section. The test setup for generating the results is shown in figure 5.6on the next page, where both Op. A and Op. B are provided with twodifferent streams of uniformly distributed random data. In the resultinggraphs the Total power, Toggle rate, UVT ratio and Gate count can be foundon the two y-axes. On the x-axis the modulo is plotted with a logarithmicscale of base two. In figure 5.5 on the facing page the values in the graphsfor the RNS adders and multipliers are pointed out and a description ofthese can be found below.

Total power The total power is the sum of the static and dynamic powerdissipation.

Toggle rate The toggle rate is relative to the entire y-axis, that is themaximum “total power” represents a toggle rate of one and a “total

40


21 22 23 24 25

Modulo, m

0

20

40

60

80

100

Tot

alp

ower

[µW

] Total power

Toggle rate

UVT ratio

Total power

Toggle rate

UVT ratio

0

100

200

300

400

500

600

Gat

eco

unt

Gate count

Gate count

RNS adder type 0

Figure 5.5: Description of the RNS multiplier and adder graphs

power” of zero represents a toggle rate of zero. The toggle rate itselfrepresents in which average rate that all the nets in the design toggles.So the toggle rate is approximately proportional to the dynamic powerdissipation in combination with the gate count.

UVT ratio The UVT ratio is represented on the y-axis in the same way asthe toggle rate. The UVT ratio is the ratio of low-leakage cells usedin the design. A UVT ratio of close to 100 % is desirable.

Gate count The gate count is a technology independent measure of thetotal area and correlates together with the UVT ratio to the staticpower dissipation.

Op. A

Op. B Sum

(a) RNS adders

Op. A

Op. B Product

(b) RNS multipliers

Figure 5.6: Test setup for RNS adders and multipliers

5.2.1 Adders

The resulting best RNS adders for each modulo are compared in figure 5.8on page 43 with the RNS adders for modulo 2n. In table 5.3 on page 44 the

41


21

22

23

24

25

26

27

28

29

210

211

212

Mo

dulo

0 50

100

150

200

250

Total power dissipation [µW]T

ype:

0

Typ

e:1

Typ

e:2

Typ

e:3

Typ

e:4

Typ

e:5

Typ

e:6

RN

Sadders

Figure 5.7: Total power dissipation for all RNS adders using uniformlydistributed input data as described in section 5.1.1 on page 36.

42


21 22 23 24 25 26 27 28 29 210 211 212 213 214 215

Modulo, m

0

50

100

150

200

250

Tot

alp

ower

diss

ipat

ion

[µW

](s

olid

line)

Tot

alar

ea[µm

2]

(das

hed

line)

m 6= 2n

m 6= 2n

m = 2n

m = 2n

Best RNS adders with respect to power

Figure 5.8: The best RNS adder for each modulo compared with RNSadders for modulo 2n. Power dissipation was calculated using uniformlydistributed input data as described in section 5.1.1 on page 36.

43


best RNS adders with corresponding adder type is presented. An observa-tion of all the results as presented in appendix C gives the conclusion (somesimplifications has been made) that modulo-2n− 1 should use adder type 3,modulo-2n should use adder type 6 and all other modulo > 67 should useadder type 1. The complete list can be found in table 5.3. The approxima-tions made for m > 67 have resulted in that 25 % of the modulus above 67use a non-optimal adder type but with an average power dissipation increaseper moduli of approximately 1 %. This approximation was necessary sinceit is not convenient to store the corresponding optimum adder types for eachmodulo in a too large array when implementing RNS.

Modulo Adder type Modulo Adder type2 6 37 13 0 41 24 6 43 25 2 47 27 3 53 18 6 59 59 2 61 2

11 2 63 313 2 64 615 3 65 216 6 67 117 2 71 119 2 73 523 5 79 129 2 2n − 1 331 3 2n 632 6 2n + 1 133 2 other m > 67 1

Table 5.3: The best adder type for chosen modulo with respect to power,refer to table 3.1 on page 12 for details about the adder types.

5.2.2 Multipliers

Since the RTL code was written in a generic fashion the modulus mi > 2n+1has been excluded since the look-up table based multiplier in table 3.2 onpage 13 was implemented using a mi×mi sized look-up table. The synthesistool elaborates the entire code and therefore it is not synthesizable since thelook-up table in the elaboration phase will become too big.

44


21 22 23 24 25

Modulo, m

0

20

40

60

80

100T

otal

pow

er[µW

]

Total power

Toggle rate

UVT ratio

0

100

200

300

400

500

600

Gat

eco

unt

Gate count

RNS adder type 0

(a) RNS adder type 0

21 22 23 24 25 26 27 28 29 210 211 212 213 214 215

Modulo, m

0

20

40

60

80

100

120

140

160

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

0

100

200

300

400

500

Gat

eco

unt

Gate count

RNS adder type 1

(b) RNS adder type 1

Figure 5.9: RNS adders type 0 and 1

45


21 22 23 24 25 26 27 28 29 210 211 212 213 214

Modulo, m

0

20

40

60

80

100

120

140

160

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

0

100

200

300

400

500

600

Gat

eco

unt

Gate count

RNS adder type 2


21 22 23 24 25 26 27 28 29 210 211 212 213 214

Modulo, m

0

20

40

60

80

100

120

140

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

50

100

150

200

250

300

350

400

450

Gat

eco

unt

Gate count

RNS adder type 3



46


23 24 25 26 27 28 29 210 211 212 213 214 215

Modulo, m

20

40

60

80

100

120

140

160

180T

otal

pow

er[µW

]

Total power

Toggle rate

UVT ratio

100

200

300

400

500

600

Gat

eco

unt

Gate count

RNS adder type 4


21 22 23 24 25 26 27 28 29 210 211 212 213 214 215

Modulo, m

0

20

40

60

80

100

120

140

160

180

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

0

100

200

300

400

500

600

Gat

eco

unt

Gate count

RNS adder type 5



47


21 22 23 24 25 26 27 28 29 210 211 212 213 214

Modulo, m

0

20

40

60

80

100

120

140

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

0

50

100

150

200

250

300

350

400

Gat

eco

unt

Gate count

RNS adder type 6


Figure 5.12: RNS adders type 6

48


21

22

23

24

25

26

27

28

29

210

211

212

Mo

dulo

0

200

400

600

800

1000

1200

Total power dissipation [µW]

Typ

e:0

Typ

e:1

Typ

e:2

Typ

e:3

Typ

e:4

Typ

e:5

Typ

e:6

Typ

e:7

RN

Sm

ultipliersR

NS

multipliers

RN

Sm

ultipliers

Figure 5.13: All RNS multipliers

49


21 22 23 24 25 26

Modulo, m

0

20

40

60

80

100

120

140

160

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

0

100

200

300

400

500

600

700

800

900

Gat

eco

unt

Gate count

RNS multiplier type 0

(a) RNS multiplier type 0

21 22 23 24 25 26 27 28 29 210 211 212

Modulo, m

0

50

100

150

200

250

300

350

400

450

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

0

200

400

600

800

1000

1200

1400

1600

1800

Gat

eco

unt

Gate count


(b) RNS multiplier type 1

Figure 5.14: RNS multipliers type 0 and 1

50


21 22 23 24 25 26 27 28 29 210 211

Modulo, m

0

500

1000

1500

2000

2500

3000T

otal

pow

er[µW

]

Total power

Toggle rate

UVT ratio

0

1000

2000

3000

4000

5000

6000

7000

8000

Gat

eco

unt

Gate count



22 23 24 25 26 27 28

Modulo, m

0

50

100

150

200

250

300

350

400

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

0

500

1000

1500

2000

2500

Gat

eco

unt

Gate count




51


23 24 25 26 27 28 29 210 211

Modulo, m

0

500

1000

1500

2000

2500

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

1000

2000

3000

4000

5000

6000

7000

8000

9000

Gat

eco

unt

Gate count



21 22 23 24 25 26 27 28 29 210 211 212

Modulo, m

0

500

1000

1500

2000

2500

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

0

1000

2000

3000

4000

5000

6000

7000

Gat

eco

unt

Gate count




52


22 23 24 25 26 27 28 29 210 211 212

Modulo, m

0

20

40

60

80

100

120

140

160

180T

otal

pow

er[µW

]

Total power

Toggle rate

UVT ratio

0

100

200

300

400

500

600

700

Gat

eco

unt

Gate count



28 213 218 223 228 233 238 243 248 253 258 263 268

Dynamic range, M

0

1000

2000

3000

4000

5000

6000

7000

8000

Tot

alp

ower

[µW

]

Total power

Toggle rate

UVT ratio

0

1000

2000

3000

4000

5000

6000

7000

Gat

eco

unt

Gate count

TCS multiplier

(b) TCS multiplier

Figure 5.17: RNS multipliers type 7 and TCS multiplier

53

5.3. MODULI-SET CHAPTER 5. RESULTS

Modulo Multiplier type Modulo Multiplier type2 7 47 43 0 53 44 7 59 45 0 61 47 0 63 28 7 64 79 0 65 1

11 4 67 413 4 71 415 2 73 416 7 79 417 1 83 419 4 89 123 4 97 129 4 127 231 2 255 232 7 2n − 1 133 1 2n 737 4 other m > 97 141 443 4

Table 5.4: The best multiplier type for chosen modulo with respect topower, refer to table 3.2 on page 13 for details about the multiplier types.

5.3 Moduli-set

The optimum moduli-sets where derived using the technique described insection 4.5 on page 33. They were optimized to have the least amount oftotal power for one tap, that is one RNS adder and one RNS multiplier. Thecomplete list of optimum moduli-sets is presented in Appendix B but someexamples are presented below in table 5.5 on the facing page. By combiningthe RNS multipliers into the optimum moduli-sets with different number ofmodulus the result in figure 5.18 on the next page can be achieved for RNSmultipliers.

54

5.3. MODULI-SET CHAPTER 5. RESULTS

Req. no. bitsN∏i=0

mi Optimum moduli-set

6 6.0 {64}7 7.39 {3, 7, 8}10 10.13 {5, 7, 32}20 20.13 {3, 5, 7, 11, 31, 32}30 30.08 {3, 5, 7, 11, 13, 19, 31, 128}40 40.02 {3, 5, 7, 11, 13, 17, 19, 29, 31, 256}50 50.03 {5, 7, 9, 11, 13, 19, 23, 29, 31, 127, 512}60 60.01 {11, 13, 17, 19, 23, 29, 31, 37, 63, 127, 4096}

Table 5.5: Some of the optimum moduli-sets and their resulting numberof bits. For the complete list refer to Appendix B.

0 10 20 30 40 50 60 70

Dynamic range, [bits]

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Tot

alp

ower

diss

ipat

ion

[mW

]

TCS

RNS 3 modulus

RNS 5 modulus

RNS 7 modulus

RNS 9 modulus

RNS 11 modulus

Figure 5.18: Combinations of RNS multipliers with a maximum of 3,5,7,9and 11 RNS multipliers in the moduli-set compared with TCS multiplier

55

5.4. FIR FILTERS CHAPTER 5. RESULTS

0 2 4 6 8 10 12 14

Dynamic range, [bits]

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Tot

alp

ower

diss

ipat

ion

[mW

]

TCS

RNS 3 modulus

RNS 5 modulus

RNS 7 modulus

RNS 9 modulus

RNS 11 modulus

Figure 5.19: Combinations of RNS adders with a maximum of 3,5,7,9 and11 RNS adders in the moduli-set compared with RNS adder for modulo 2n

(which is almost identical to a TCS adder)

5.4 FIR filters

RNS and TCS FIR filters has been synthesized and simulated to get resultson RNS performance in a real DSP application.

5.4.1 Varying input word length

In figure 5.20 on the facing page and figure 5.21 on page 58 RNS FIR filterswith varying input word length has been tested. Uniformly distributed datahas been used in the first figure and sawtooth data in the second.

56


24 26 28 210 212 214 216 218 220 222 224 226 228

Input bit width

0

10

20

30

40

50

60

70

80T

otal

pow

er[mW

]

1% 1%

5%8%

13% 14% 15% 15% 15%17% 16% 15%

20%18%

15% 16%

Total power TCS

Toggle rate TCS

UVT ratio TCS

Total power RNS

Toggle rate RNS

UVT ratio RNS

0

50000

100000

150000

200000

250000

300000

350000

Gat

eco

unt

Gate count TCS

Gate count RNS

RNS vs. TCS 64-tap FIR filter random data and coefficients

Figure 5.20: 64-tap FIR filter with varying input bit width for RNS andTCS. Uniform data as described in section 5.1.1 on page 36. The red linerepresents the power reduction.

57


24 26 28 210 212 214 216 218 220 222 224 226 228

Input bit width

0

2

4

6

8

10

12

14

16

18

Tot

alp

ower

[mW

]

6%

16%

8%

12% 12% 11%13% 14% 14%

18%15%

18%20%

16%13%

18%15%

10% 10%

Total power TCS

Toggle rate TCS

UVT ratio TCS

Total power RNS

Toggle rate RNS

UVT ratio RNS

0

10000

20000

30000

40000

50000

60000

70000

80000

Gat

eco

unt

Gate count TCS

Gate count RNS

RNS vs. TCS 16-tap FIR filter sawtooth input data

Figure 5.21: 16-tap FIR filter with varying input word length for RNS andTCS. Sawtooth data with ramp coefficients as described in section 5.1.2 onpage 37. The red line represents the power reduction.

5.4.2 Varying number of taps

A realistic FIR filter for a telecommunication system can have an inputword length of 20 bits. By varying the number of taps, the results shouldbe comparable with, for example [5] in which the authors have performedsimilar tests. In figure 5.22 on the facing page the results of an FIR filterwith 20-bit uniformly distributed input data are presented and in figure 5.23on page 60 exactly the same RNS FIR filter is provided with realistic inputdata and coefficients instead.

58


0 20 40 60 80 100 120 1400

20

40

60

80

100T

otal

pow

er[mW

]an

dp

erce

ntag

e

25%

20%18% 17%

15% 15% 15% 15% 15% 15% 15% 15% 15% 15%

Total power TCS

Toggle rate TCS

UVT ratio TCS

Total power RNS

Toggle rate RNS

UVT ratio RNS

0

50000

100000

150000

200000

250000

300000

350000

400000

Gat

eco

unt

Gate count TCS

Gate count RNS

RNS vs. TCS 20 input bits FIR filter random input data and coefficients

Figure 5.22: 20-bit FIR filter with varying number of taps for RNS andTCS. Uniformly distributed data and coefficients are used as described insection 5.1.1 on page 36. The red line represents the power reduction.

59


0 20 40 60 80 100 120 1400

20

40

60

80

100

Tot

alp

ower

[mW

]an

dp

erce

ntag

e

16%

7%5%

-1%1% 1% -0%

-2% -1%-2% -2%-5% -5% -5%

Total power TCS

Toggle rate TCS

UVT ratio TCS

Total power RNS

Toggle rate RNS

UVT ratio RNS

0

50000

100000

150000

200000

250000

300000

350000

400000

Gat

eco

unt

Gate count TCS

Gate count RNS

RNS vs. TCS 20 input bits FIR filter real input data and coefficients

Figure 5.23: 20-bit FIR filter with varying number of taps for RNS andTCS. Realistic data with constant FIR coefficients are used as described insection 5.1.3 on page 37. The red line represents the power reduction.

5.4.3 Folded FIR filter

In a more realistic FIR filter the input data will not arrive every clockcycle and therefore a folded FIR filter could be used instead. An FIR filterthat has been folded N times has been designed where N is the numberof taps. The resulting schematic has earlier been discussed in figure 4.21on page 35. The FIR filter was designed with 20-bit input and 22 tapsand providing it with uniformly distributed input data and coefficients asdescribed in section 5.1.1 on page 36, yielding an RNS dynamic range of20 · 2 + dlog2 22e = 45 bits.

TCS RNS DifferenceTotal power 2.69 mW 3.34 mW + 24 %Total area 5745 µm2 10323 µm2 + 80 %No. registers 986 2319 + 135 %

Table 5.6: Results for an FIR filter folded 22 times with 20-bit input and22 taps

60

5.5. MAXIMUM FREQUENCY CHAPTER 5. RESULTS

5.5 Maximum frequency

The maximum frequency of the RNS computational elements has as anexample been calculated for a 20 and 30-bit input 4-tap FIR filter for bothRNS and TCS. For these designs the synthesis tool was trying to achievea maximum frequency of 1.5 GHz. The results are presented in table 5.7.From the results it is clear that the implementation of RNS is slightly worsein terms of maximum frequency when it is compared with TCS. This resultreasonable since the individual RNS adders and multipliers in this thesishave been optimized for power and not for maximum frequency.

20-bit TCS 20-bit RNS 30-bit TCS 30-bit RNSMax frequency 1066 MHz 1052 MHz 981 MHz 895 MHzTotal area 9262 µm2 6414 µm2 18693 µm2 11645 µm2

UVT ratio 10.43 % 14.74 % 8.46 % 18.19 %

Table 5.7: Synthesis results for 4-tap FIR filter with 20 or 30 input bit-width. The synthesis maximum frequency goal was set to 1.5 GHz.

61

Chapter 6

Discussion and conclusions

In this section the results will be discussed, analyzed and compared withsome previous research. In general the results are in the same line as previousresearch. The main conclusion of this thesis is the one that can be realizedfrom analyzing figure 5.22 on page 59 and figure 5.23 on page 60: RNS willachieve a decrease in power dissipation when providing it with random databut using a more realistic data will not reduce power dissipation.

6.1 Adders

By analyzing the different adders in figure 5.9, figure 5.10, figure 5.11 andfigure 5.12 on page 48 it is quite clear that some of the modulus are moreefficient. Naturally the modulo 2n adders are the best since they will havethe same profile as ordinary TCS adders as can be seen in figure 5.8 onpage 43. These results are clearly visible in almost all graphs as “dips” inboth total power and gate count. These dips are most likely caused by thefact that the synthesis tool detects that it has been provided by almost anordinary TCS adder and optimizes the hardware for that.

Another interesting observation when analyzing the results from the RNSadders is the different behavior when using a look-up table based implemen-tation or a combinatorial implementation, as clearly can be seen by com-paring figure 5.9a on page 45 and figure 5.9b on page 45. In figure 5.9a it isquite clear that the power and gate count increase with increasing modulobut in figure 5.9b it seems instead to be more related to the number of bitsused to represent the modulo.

By looking at the results for all the different RNS adders it is possible togeneralize the results to that the power and gate count of the RNS addersimplemented in this thesis grows linearly. This will lead to the fact that acombination of only RNS adders in a moduli-set will not be better than aTCS adder for the same bit-width. This can clearly be seen in figure 5.19on page 56. Fortunately RNS is almost as good as TCS so a generalized

62

6.2. MULTIPLIERS CHAPTER 6. DISCUSSION AND CONCLUSIONS

view of it would be that a combination of RNS adders will almost performas good as one TCS adder representing the same dynamic range.

6.2 Multipliers

The results for the multipliers have in general the same properties as theresults for the RNS adders. The main difference is that power and gate countseems to grow more than linearly, which is logical since multiplication is morecomplex than addition. Due to this more than linear growth a combinationof RNS multipliers will at a certain point outperform TCS multipliers ascan be seen in figure 5.18 on page 55. The conclusion from this is that RNSmultipliers will outperform TCS multipliers as long as enough modulus areused in the moduli-set and that the dynamic range (or output bit width) isgreater than approximately 22 bits.

Another interesting observation is that most all the multipliers for mod-ulo 6= 2n or 2n − 1 utilize some kind of LUT approach. These LUTs canprobably share the hardware resources more than what they currently doand this could be a future upgrade of both the multipliers and adders.

6.3 FIR filters

Regarding the FIR filters [5] performed tests in an environment that werevery similar to the ones presented in figure 5.22 on page 59. The resultspresented in [5] were using a moduli-set of {7, 11, 13, 17, 64} which is equalto a dynamic range of 20 bits, while the results in figure 5.22 are using ainput bit-width of 20 bits which results in a dynamic range of at least 40bits. In [5] the authors manage to achieve a power reduction between 30and 35 %, while the power reduction in figure 5.22 is approximately 15 %.This difference can be explained by the fact that the cell libraries used forsynthesis in industry are far more enriched in terms of for example differentmultiplier types which will benefit TCS FIR filters.

Even more interesting results are achieved when providing the same FIRfilter as described above with a more realistic input data and coefficients ascan be seen in figure 5.23 on page 60. In this case the RNS FIR filter willactually increase the power dissipation. This can be explained by inspectingtable 5.1 on page 40 and table 5.2 on page 40. The very interesting resultsin these tables is that RNS seems to treat almost any input data type asrandom data which is caused by the forward conversion, hence almost anytype of input data will toggle as random data in RNS. So to get a powerreduction with RNS the original input data has to toggle a lot.

From the results in figure 5.23 on page 60 one might be able to reasonthat a completely folded FIR filter would decrease the power even moresince a 12-tap FIR filter using realistic data actually decrease the powerdissipation with 16%. Unfortunately a folded FIR filter will have some sort

63

6.3. FIR FILTERS CHAPTER 6. DISCUSSION AND CONCLUSIONS

of shift registers at the input which would use a significantly larger area inRNS in comparison with TCS due to the fact that the entire RNS wordlength have to be stored instead of only the input word length as in TCS.This difference will affect the result far more in the folded FIR filter casedue to a heavily increased number of registers at the inputs.

64

Chapter 7

Future Work

RNS can be investigated much further, since a lot of research has been donein the field and a lot of other implementations has been proposed. Furtherinvestigation of RNS would probably decrease the power dissipation further.Some examples of this that I have discovered during the thesis are presentedbelow:

• In [27] a possibly better RNS multiplier is presented by using the factthat not all input bit combinations are used for modulus 6= 2n.

• Some research has been made on multi-level RNS, which simply meansthat the largest modulus will consist of an additional level of RNS. Thismight be interesting since the implementations of RNS investigated inthis thesis usually use a slightly larger bit-width compared with theones implemented in academic papers. [28] has used a multi-level RNS.

• Due to the inherent properties of small word lengths in RNS very manyinteresting arithmetic algorithms can be investigated, for example one-hot encoding and shift- and add multiplication [29].

• The parallel-prefix multiplier implemented in this thesis can probablybe re-implemented using a more sophisticated adder tree structure.

• It would be interesting to optimize the moduli-sets for realistic dataas well (instead of only using a uniform distribution as been optimizedfor in this thesis).

If RNS should be implemented in a real system further research wouldalso be required on forward and reverse conversion (these has to be properlyadapted to the surrounding system). Also further research would probablybe necessary on scaling, overflow and sign detection since the RNS compu-tations would have to be big to compensate for the conversion process andtherefore probably would require these operations.

65

References

[1] Lay Yong Lam and Tian Se Ang. Fleeting footsteps tracing the con-ception of arithmetic and algebra in ancient China. River Edge, N.J.: World Scientific, 2004 (cit. on p. 1).

[2] Michael A. Soderstrand, W. Kenneth Jenkins, Graham A. Jullien,and Fred J. Taylor, eds. Residue Number System Arithmetic: ModernApplications in Digital Signal Processing. Piscataway, NJ, USA: IEEEPress, 1986 (cit. on p. 3).

[3] N.S. Szabo and R.I. Tanaka. Residue arithmetic and its applicationsto computer technology. McGraw-Hill series in information processingand computers. McGraw-Hill, 1967 (cit. on p. 3).

[4] Amos R. Omondi and Benjamin Premkumar. Residue number systems: theory and implementation. Advances in computer science and en-gineering: Texts: v. 2. London : Imperial College Press ; Singapore; Hackensack, NJ : Distributed by World Scientific Publishing, 2007(cit. on pp. 3, 5, 8, 11–14, 22, 25, 30, 31).

[5] G.C. Cardarilli, A. Del Re, A. Nannarelli, and M. Re. “Low Powerand Low Leakage Implementation of RNS FIR Filters.” In: ConferenceRecord of the Thirty-Ninth Asilomar Conference on Signals, Systems& Computers, 2005 (2005), p. 1620 (cit. on pp. 3, 12, 14, 24, 58, 63).

[6] G.C. Cardarilli, M. Re, A. Del Re, and A. Nannarelli. “Impact of RNScoding overhead on FIR filters performance.” In: Conference Record- Asilomar Conference on Signals, Systems and Computers. Confer-ence Record of the 41st Asilomar Conference on Signals, Systems andComputers, ACSSC. Department of Electronics, University of RomeTor Vergata, 2007, pp. 1426–1429 (cit. on p. 3).

[7] W.L. Freking, K.K. Parhi, M.P. Fargues, and R.D. Hippenstiel. “Low-power FIR digital filters using residue arithmetic.” In: vol. 1. 1998(cit. on pp. 3, 14).

[8] G.C. Cardarilli, A. Nannarelli, and M. Re. “Reducing power dissi-pation in FIR filters using the residue number system.” In: MidwestSymposium on Circuits and Systems. Vol. 1. Department of Electrical

66

REFERENCES REFERENCES

Engineering, Univ. of Rome Tor Vergata, 2000, pp. 320–323 (cit. onpp. 7, 14, 24).

[9] D. Zivaljevic, N. Stamenkovic, and V. Stojanovic. “Digital filter im-plementation based on the RNS with diminished-1 encoded channel.”In: University of Nis, Faculty of Electronic Engineering, A. Medvedeva14, Nis, 18000, Serbia, 2012 (cit. on p. 8).

[10] Sune Soderkvist. Fran insignal till utsignal. 2007 (cit. on p. 8).

[11] Himanshu Bhatnagar. Advanced ASIC chip synthesis using Synop-sys Design Compiler, Physical Compiler, and PrimeTime / HimanshuBhatnagar. Boston : Kluwer Academic Publishers, 2002 (cit. on p. 9).

[12] PrimeTime Datasheet. Accessed: 2014-05-08. url: http://www.synopsys.com/Tools/Implementation/SignOff/Documents/primetime\_ds.

pdf (cit. on p. 10).

[13] Gordon Yip. Expanding the Synopsys PrimeT ime R© Solution withPower Analysis. 2006. url: https://www.synopsys.com/Tools/

Implementation/SignOff/CapsuleModule/ptpx_wp.pdf (cit. onp. 10).

[14] M. Bayoumi, G. Jullien, and W. Miller. “A VLSI implementation ofresidue adders.” In: IEEE Transactions on Circuits & Systems 34.3(1987), p. 284 (cit. on p. 11).

[15] R. Zimmermann, I. Koren, and P. Kornerup. “Efficient VLSI imple-mentation of modulo (2n ± 1) addition and multiplication.” In: Inte-grated Syst. Lab., Swiss Federal Inst. of Technol., Zurich, Switzerland,1999 (cit. on pp. 11, 18).

[16] R. Zimmermann. “VHDL Library of Arithmetic Units”. In: Proc. FirstInt. Forum on Design Languages (FDL’98) (1998) (cit. on pp. 11, 18).

[17] A.A. Hiasat. “New efficient structure for a modular multiplier forRNS.” In: IEEE Transactions on Computers 49.2 (2000), pp. 170–174 (cit. on pp. 12, 22).

[18] G. Alia and E. Martinelli. “A VLSI modulo m multiplier.” In: IEEETransactions on Computers 40.7 (1991), p. 873 (cit. on p. 12).

[19] Ramya Muralidharan and Chip-Hong Chang. “Area-Power EfficientModulo 2n − 1 and Modulo 2n + 1 Multipliers for {2n − 1, 2n, 2n + 1}Based RNS.” In: IEEE Transactions on Circuits & Systems. Part I:Regular Papers 59.10 (2012), p. 2263 (cit. on p. 12).

[20] Alexander Skavantzos and Poornachandra B. Rao. “New multipliersmodulo 2N - 1.” In: IEEE Transactions on Computers 41.8 (1992),pp. 957–961 (cit. on pp. 12, 13, 25).

[21] Ivan Matveyevich Vinogradov. Elements of number theory. New York:Dover, 1954 (cit. on pp. 13, 24).

67

http://www.synopsys.com/Tools/Implementation/SignOff/Documents/primetime\_ds.pdf



https://www.synopsys.com/Tools/Implementation/SignOff/CapsuleModule/ptpx_wp.pdf

https://www.synopsys.com/Tools/Implementation/SignOff/CapsuleModule/ptpx_wp.pdf

REFERENCES REFERENCES

[22] A.B. Premkumar. “A formal framework for conversion from binaryto residue numbers”. In: Circuits and Systems II: Analog and DigitalSignal Processing, IEEE Transactions on 49.2 (2002), pp. 135–144(cit. on p. 13).

[23] “IEEE Standard for SystemVerilog–Unified Hardware Design, Specifi-cation, and Verification Language”. In: IEEE STD 1800-2009 (2009),pp. 1–1285 (cit. on p. 16).

[24] J.H. McClellan, T.W. Parks, and L. Rabiner. “A computer programfor designing optimum FIR linear phase digital filters”. In: Audio andElectroacoustics, IEEE Transactions on 21.6 (1973), pp. 506–526 (cit.on p. 35).

[25] J. McClellan and T. Parks. “A unified approach to the design of opti-mum FIR linear-phase digital filters”. In: Circuit Theory, IEEE Trans-actions on 20.6 (1973), pp. 697–701 (cit. on p. 35).

[26] SciPy v0.13 Reference Guide for the Remez exchange algorithm. Ac-cessed: 2014-05-07. url: http : / / docs . scipy . org / doc / scipy -

0.13.0/reference/generated/scipy.signal.remez.html (cit. onp. 37).

[27] V. Paliouras, K. Karagianni, and T. Stouraitis. “A low-complexitycombinatorial RNS multiplier.” In: IEEE Transactions on Circuits andSystems II: Analog and Digital Signal Processing 48.7 (2001), pp. 675–683 (cit. on p. 65).

[28] Jassbi S. Jafarali, Navi K., and Khademzadeh A. “An optimum moduliset in residue number system.” In: International Mathematical Forum59 (2010), p. 2911 (cit. on p. 65).

[29] K. Johansson, O. Gustafsson, and L. Wanhammar. “Bit-Level Opti-mization of Shift-and-Add Based FIR Filters”. In: Electronics, Circuitsand Systems, 2007. ICECS 2007. 14th IEEE International Conferenceon. 2007, pp. 713–716 (cit. on p. 65).

68

http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.signal.remez.html

http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.signal.remez.html

Appendix A

Modulus

2 101 233 5113 103 239 5125 107 241 5137 109 251 102311 113 509 102413 127 1021 102517 131 2039 204719 137 4093 204823 139 8179 204929 149 16381 409531 151 4 409637 157 8 409741 163 9 819143 167 15 819247 173 16 819353 179 32 1638359 181 33 1638461 191 63 1638567 193 6471 197 6573 199 12879 211 12983 223 25589 227 25697 229 257

69

Appendix B

Optimum moduli-sets

Table B.1: Resulting moduli-sets

Req. no. bits Resulting no. bits Resulting moduli-set2 2.0 {4}3 3.0 {8}4 4.0 {16}5 5.0 {32}6 6.0 {64}7 7.39231742278 {3, 7, 8}8 8.39231742278 {3, 7, 16}9 9.12928301694 {5, 7, 16}10 10.1292830169 {5, 7, 32}11 11.1292830169 {5, 7, 64}12 12.0927571409 {3, 7, 13, 16}13 13.0927571409 {3, 7, 13, 32}14 14.0927571409 {3, 7, 13, 64}15 15.1736771363 {3, 5, 7, 11, 32}16 16.1736771363 {3, 5, 7, 11, 64}17 17.1736771363 {3, 5, 7, 11, 128}18 18.0469534513 {3, 7, 13, 31, 32}19 19.0469534513 {3, 7, 13, 31, 64}20 20.1278734467 {3, 5, 7, 11, 31, 32}21 21.1278734467 {3, 5, 7, 11, 31, 64}22 22.1278734467 {3, 5, 7, 11, 31, 128}23 23.1278734467 {3, 5, 7, 11, 31, 256}24 24.1278734467 {3, 5, 7, 11, 31, 512}25 25.1220443679 {3, 5, 7, 11, 13, 19, 128}26 26.1220443679 {3, 5, 7, 11, 13, 19, 256}27 27.0762406783 {3, 5, 7, 11, 13, 16, 19, 31}28 28.0762406783 {3, 5, 7, 11, 13, 19, 31, 32}29 29.0762406783 {3, 5, 7, 11, 13, 19, 31, 64}30 30.0762406783 {3, 5, 7, 11, 13, 19, 31, 128}31 31.0762406783 {3, 5, 7, 11, 13, 19, 31, 256}32 32.0762406783 {3, 5, 7, 11, 13, 19, 31, 512}33 33.0762406783 {3, 5, 7, 11, 13, 19, 31, 1024}34 34.209856116 {3, 5, 7, 11, 13, 23, 29, 31, 64}

Continued on next page

70

APPENDIX B. OPTIMUM MODULI-SETS

Table B.1 – continued from previous pageReq. no. bits Resulting no. bits Resulting moduli-set35 35.209856116 {3, 5, 7, 11, 13, 23, 29, 31, 128}36 36.209856116 {3, 5, 7, 11, 13, 23, 29, 31, 256}37 37.209856116 {3, 5, 7, 11, 13, 23, 29, 31, 512}38 38.209856116 {3, 5, 7, 11, 13, 23, 29, 31, 1024}39 39.0216845147 {3, 5, 7, 11, 13, 17, 19, 29, 31, 128}40 40.0216845147 {3, 5, 7, 11, 13, 17, 19, 29, 31, 256}41 41.0427461302 {5, 7, 9, 11, 13, 19, 23, 29, 31, 128}42 42.0427461302 {5, 7, 9, 11, 13, 19, 23, 29, 31, 256}43 43.0427461302 {5, 7, 9, 11, 13, 19, 23, 29, 31, 512}44 44.0427461302 {5, 7, 9, 11, 13, 19, 23, 29, 31, 1024}45 45.0427461302 {5, 7, 9, 11, 13, 19, 23, 29, 31, 2048}46 46.1302089714 {5, 7, 9, 11, 13, 17, 19, 23, 29, 31, 256}47 47.1302089714 {5, 7, 9, 11, 13, 17, 19, 23, 29, 31, 512}48 48.1302089714 {5, 7, 9, 11, 13, 17, 19, 23, 29, 31, 1024}49 49.1302089714 {5, 7, 9, 11, 13, 17, 19, 23, 29, 31, 2048}50 50.031430817 {5, 7, 9, 11, 13, 19, 23, 29, 31, 127, 512}51 51.031430817 {5, 7, 9, 11, 13, 19, 23, 29, 31, 127, 1024}52 52.031430817 {5, 7, 9, 11, 13, 19, 23, 29, 31, 127, 2048}53 53.031430817 {5, 7, 9, 11, 13, 19, 23, 29, 31, 127, 4096}54 54.0247889997 {7, 11, 13, 15, 19, 23, 29, 31, 37, 41, 2048}55 55.0247889997 {7, 11, 13, 15, 19, 23, 29, 31, 37, 41, 4096}56 56.0010571679 {7, 11, 13, 15, 19, 23, 29, 31, 47, 127, 2048}57 57.0010571679 {7, 11, 13, 15, 19, 23, 29, 31, 47, 127, 4096}58 58.0083788194 {7, 11, 13, 15, 19, 29, 31, 41, 53, 127, 4096}59 59.0725091568 {7, 13, 15, 19, 23, 29, 31, 41, 53, 127, 4096}60 60.0064189289 {11, 13, 17, 19, 23, 29, 31, 37, 63, 127, 4096}61 61.2765080923 {11, 13, 19, 23, 29, 31, 37, 41, 63, 127, 4096}62 62.0184105261 {11, 13, 19, 23, 29, 31, 43, 59, 63, 127, 4096}63 63.0023331289 {11, 19, 23, 29, 31, 37, 41, 43, 63, 127, 4096}64 64.0229340157 {11, 13, 23, 29, 31, 37, 41, 63, 127, 255, 2048}65 65.0282412996 {13, 19, 23, 29, 31, 41, 43, 63, 127, 255, 2048}66 66.0207068321 {15, 19, 23, 29, 31, 37, 41, 63, 127, 511, 2048}

71

Appendix C

RNS adders results

The results have been generated using uniformly distributed random inputoperands and a setup as described in figure 5.6a on page 41. Below is aDescription of the titles in the header row of table C.1.

Modulo The modulo

Type Results for the specific adder type, see table 3.1 on page 12 for details.

Area The total area of the adder including the registers, in µm2.

Gates The total gate count including registers.

Switch power The switching power. See section 2.3.3 on page 10 for de-tails about the different power values in µW.

Int. power The internal power in µW.

Leak power The leakage power in µW.

Total power The total power in µW.

Toggle rate The average toggle rate per net per clock cycle.

UVT cells The percentage of UVT cells. See section 2.3.3 on page 10 fordetails.

72

APPENDIX C. RNS ADDERS RESULTS

Tab

leC

.1:

Res

ult

sfo

rR

NS

ad

der

s

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s2

014

30

0.2

38.0

70.3

85

8.6

90.2

185.7

1%

21

14

30

0.2

38.0

70.3

85

8.6

90.2

185.7

1%

22

14

30

0.2

38.0

70.3

85

8.6

90.2

185.7

1%

25

14

30

0.2

38.0

70.3

85

8.6

90.2

185.7

1%

26

14

30

0.2

38.0

70.3

85

8.6

90.2

185.7

1%

BE

ST

30

26

56

1.0

715.5

0.6

14

17.2

0.2

93.3

3%

BE

ST

31

29

62

1.7

416

0.6

77

18.4

0.2

95.2

4%

32

28

59

1.3

115.7

0.6

517.7

0.2

94.4

4%

33

29

61

1.5

16

0.6

81

18.1

0.2

95%

35

29

61

1.4

915.9

0.6

81

18.1

0.2

95.2

4%

40

26

55

0.5

26

16.1

0.6

94

17.3

0.2

291.6

7%

41

26

55

0.5

26

16.1

0.6

94

17.3

0.2

291.6

7%

42

26

55

0.5

26

16.1

0.6

94

17.3

0.2

291.6

7%

45

26

55

0.5

26

16.1

0.6

94

17.3

0.2

291.6

7%

46

26

55

0.5

26

16.1

0.6

94

17.3

0.2

291.6

7%

BE

ST

50

48

103

3.7

624.4

1.0

829.2

0.1

897.5

%5

144

93

2.2

324.5

1.0

727.8

0.1

996%

52

41

88

2.2

823.9

127.2

0.2

95.8

3%

BE

ST

55

46

99

3.2

524.7

1.1

329

0.1

996.8

8%

70

60

128

6.8

827.4

1.3

835.7

0.2

98.2

5%

71

43

91

2.6

25.4

1.0

729.1

0.2

295.8

3%

72

43

92

2.9

525.3

1.0

929.4

0.2

296.3

%7

341

87

2.5

424.9

128.5

0.2

295.4

5%

BE

ST

75

45

96

2.9

925.8

1.1

229.9

0.2

196.4

3%

80

38

80

1.5

724.5

0.9

61

27

0.2

393.7

5%

81

38

80

1.5

724.5

0.9

62

27

0.2

393.7

5%

82

38

80

1.5

724.5

0.9

62

27

0.2

393.7

5%

85

38

80

1.5

724.5

0.9

62

27

0.2

393.7

5%

86

38

80

1.5

724.5

0.9

62

27

0.2

393.7

5%

BE

ST

90

98

209

8.5

633.7

2.0

444.3

0.1

199.0

7%

91

59

125

3.7

333.8

1.5

339.1

0.2

96.6

7%

92

56

118

3.1

532.4

1.3

936.9

0.2

96.8

8%

BE

ST

94

63

134

4.9

833.5

1.4

940

0.2

97.7

3%

95

60

127

3.8

332.7

1.4

537.9

0.1

897.4

4%

11

0125

265

11.2

36

2.6

749.9

0.1

99.3

5%

11

159

125

3.8

734.9

1.5

740.4

0.2

196.7

7%

Conti

nued

on

next

page

73


Table

C.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s11

259

126

4.5

333.5

1.4

539.5

0.2

197.3

7%

BE

ST

11

563

133

4.8

533.9

1.5

340.3

0.2

97.7

8%

13

0150

320

13

37

3.2

53.2

0.0

85

99.4

8%

13

159

125

4.0

935.5

1.5

941.2

0.2

296.7

7%

13

259

125

4.7

834.2

1.4

440.4

0.2

297.3

%B

EST

13

561

129

4.4

434.7

1.5

140.6

0.2

197.4

4%

15

0128

272

19.7

44.1

2.8

866.8

0.1

999.3

7%

15

158

124

3.8

135.6

1.6

141

0.2

296.4

3%

15

258

122

4.2

434.5

1.4

440.2

0.2

396.9

7%

15

356

118

4.0

434.3

1.4

139.7

0.2

396.8

8%

BE

ST

15

558

123

4.2

434.3

1.4

139.9

0.2

296.9

7%

16

065

139

5.8

435

1.6

42.4

0.2

198.1

5%

16

150

106

1.9

832.8

1.3

36.1

0.2

395.2

4%

16

250

106

1.9

832.8

1.3

36.1

0.2

395.2

4%

16

550

106

1.9

832.8

1.3

36.1

0.2

395.2

4%

16

650

106

1.9

832.8

1.3

36.1

0.2

395.2

4%

BE

ST

17

0193

410

19.3

48.9

4.4

572.6

0.1

199.6

2%

17

173

155

4.9

342.4

1.8

849.2

0.2

97.3

7%

17

269

147

4.3

440.4

1.7

346.5

0.1

997.5

%B

EST

17

576

162

4.7

441.4

1.8

948.1

0.1

698.0

4%

19

0222

473

21.2

50.9

5.2

577.4

0.1

199.6

8%

19

173

155

4.8

943.3

1.9

150.1

0.2

197.2

2%

19

273

156

5.4

342

1.8

249.2

0.2

197.8

7%

BE

ST

19

576

163

5.3

642.5

1.8

949.7

0.1

897.9

6%

23

0267

568

23.5

53.5

6.4

383.4

0.0

97

99.7

5%

23

173

155

4.8

743.9

1.9

650.7

0.2

197.3

%23

277

163

6.3

242.7

1.8

650.9

0.2

198.1

1%

23

577

163

5.4

343.2

1.9

550.6

0.1

897.9

2%

BE

ST

29

0240

511

32.8

59.4

5.5

97.8

0.1

499.7

%29

173

155

5.1

644.6

1.9

851.7

0.2

297.3

%29

273

154

5.7

642.9

1.8

50.5

0.2

297.7

3%

BE

ST

29

575

160

5.2

44

1.9

351.1

0.2

297.7

8%

31

172

154

4.8

44.5

251.3

0.2

297.0

6%

31

272

153

5.8

143

1.8

150.6

0.2

397.7

8%

31

370

148

5.1

642.6

1.7

549.5

0.2

397.7

3%

BE

ST

31

573

156

4.8

843.3

1.8

750.1

0.2

297.6

2%

32

0119

252

16.7

51.2

2.8

170.7

0.2

299.1

6%

32

161

130

2.6

741.1

1.6

45.4

0.2

396%

Conti

nued

on

next

page

74


Table

C.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s32

261

130

2.6

741.1

1.6

45.4

0.2

396%

32

561

130

2.6

741.1

1.6

45.4

0.2

396%

32

661

130

2.6

741.1

1.6

45.4

0.2

396%

BE

ST

33

188

186

6.0

751.4

2.2

759.8

0.2

197.8

3%

33

283

176

4.8

949.3

2.1

56.3

0.1

997.7

3%

BE

ST

33

493

198

6.8

250.7

2.2

959.9

0.2

98.3

9%

33

591

193

5.9

850.5

2.2

858.8

0.1

798.2

8%

37

187

185

6.1

952

2.2

960.5

0.2

197.7

8%

BE

ST

37

291

193

7.3

51

2.2

560.6

0.2

98.2

5%

37

592

197

7.0

751.2

2.3

60.6

0.1

898.3

3%

41

187

185

6.2

952.6

2.3

61.2

0.2

297.7

8%

41

292

195

7.1

251.5

2.2

660.8

0.1

998.2

8%

BE

ST

41

593

197

6.8

251.8

2.3

461

0.1

898.4

4%

43

187

185

6.3

552.9

2.3

361.5

0.2

297.8

3%

43

288

186

7.3

351.3

2.1

860.9

0.2

298.1

5%

BE

ST

43

593

197

7.5

352

2.3

261.8

0.1

998.3

6%

47

187

184

5.8

552.8

2.3

461

0.2

297.6

7%

47

289

190

7.0

251.6

2.2

160.8

0.2

198.2

1%

BE

ST

47

593

198

6.9

852.1

2.3

361.4

0.1

898.4

1%

53

187

185

6.6

853.8

2.3

462.9

0.2

397.8

3%

BE

ST

53

294

200

8.8

53

2.2

964.1

0.2

198.4

4%

53

592

195

8.0

853

2.3

363.4

0.2

98.3

1%

59

187

184

6.1

753.5

2.3

762.1

0.2

297.6

7%

59

287

184

7.4

152.2

2.1

961.8

0.2

398.0

8%

59

589

189

6.5

252.8

2.3

261.6

0.1

998.1

1%

BE

ST

61

187

184

6.3

353.8

2.3

762.5

0.2

397.6

7%

61

286

183

6.9

552.3

2.1

861.4

0.2

397.9

6%

BE

ST

61

588

187

6.4

452.9

2.2

661.6

0.2

98.0

8%

63

186

183

5.8

153.3

2.3

861.4

0.2

297.5

%63

286

183

6.8

52

2.1

860.9

0.2

297.9

6%

63

382

175

650.9

2.0

559

0.2

397.9

6%

BE

ST

63

587

185

6.3

351.7

2.2

160.3

0.2

198%

64

172

154

3.6

449.5

1.8

655

0.2

96.3

%64

272

154

3.6

449.5

1.8

655

0.2

396.3

%64

572

154

3.6

449.5

1.8

655

0.2

396.3

%64

672

154

3.6

449.5

1.8

655

0.2

396.3

%B

EST

65

1102

216

6.9

859.9

2.6

569.5

0.2

98.1

1%

65

298

209

6.2

557.6

2.4

666.3

0.1

998.1

8%

BE

ST

Conti

nued

on

next

page

75


Table

C.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s65

4121

257

10.7

62.1

3.0

575.8

0.2

98.9

%65

5107

228

6.6

458.8

2.7

68.1

0.1

698.5

5%

67

1101

215

6.9

660.1

2.6

469.7

0.2

198.0

4%

BE

ST

67

2107

228

8.3

159.1

2.7

70.1

0.1

998.6

1%

67

5107

228

7.4

259.7

2.7

269.8

0.1

798.5

1%

71

1101

214

7.0

760.8

2.6

570.5

0.2

198%

BE

ST

71

2109

232

8.9

760.1

2.7

471.8

0.2

98.6

8%

71

5108

230

7.8

60.1

2.7

270.6

0.1

898.5

7%

73

1101

215

7.4

60.7

2.6

570.7

0.2

198.0

4%

73

2114

243

9.9

260.3

2.8

73.1

0.1

998.8

2%

73

5108

230

7.9

459.6

2.7

70.3

0.1

798.6

1%

BE

ST

79

1100

213

6.8

961.2

2.6

870.8

0.2

297.9

2%

BE

ST

79

2110

234

9.8

360.5

2.6

873

0.2

98.6

8%

79

5109

232

8.1

60.5

2.7

571.3

0.1

898.5

7%

83

1101

215

7.4

161.7

2.6

871.8

0.2

298.0

4%

BE

ST

83

2153

325

16.3

64.1

3.7

384.1

0.1

699.3

4%

83

5107

229

8.7

961

2.7

72.5

0.1

998.6

3%

89

1101

215

7.4

662

2.6

972.1

0.2

298.0

4%

BE

ST

89

2141

300

14.2

63.9

3.4

281.5

0.1

799.2

4%

89

5108

231

8.3

561.5

2.7

472.6

0.1

998.6

3%

97

1101

216

7.3

561.6

2.6

771.7

0.2

298.0

8%

97

2145

309

14.7

64.4

3.4

982.6

0.1

799.2

8%

97

5105

224

7.3

660.6

2.6

870.7

0.1

898.4

6%

BE

ST

101

1101

215

7.6

162.2

2.7

72.5

0.2

398.0

4%

BE

ST

101

2154

327

16.8

65.2

3.7

485.7

0.1

699.3

4%

101

5107

228

8.7

861.1

2.7

372.6

0.1

998.5

7%

103

1100

213

7.2

962.5

2.7

172.5

0.2

397.9

2%

BE

ST

103

2157

335

16.6

66

3.8

86.4

0.1

699.3

6%

103

5107

227

8.3

961.8

2.6

972.9

0.1

998.5

1%

107

1101

215

7.5

862.7

2.7

373

0.2

398.0

8%

BE

ST

107

2161

342

17.1

66.1

3.8

587.1

0.1

699.3

7%

107

5109

232

9.6

462.2

2.7

674.6

0.2

98.6

3%

109

1101

215

7.7

663.1

2.7

373.6

0.2

398.0

8%

BE

ST

109

2146

311

16.2

65.8

3.6

585.6

0.1

799.2

8%

109

5106

225

8.7

262.4

2.7

73.8

0.2

98.4

6%

113

1101

214

7.4

862.1

2.6

972.3

0.2

398%

113

2122

259

12

63.1

2.9

878.2

0.2

98.9

6%

113

5105

223

7.6

761.3

2.6

971.7

0.1

898.5

1%

BE

ST

Conti

nued

on

next

page

76


Table

C.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s127

1100

212

6.8

362.1

2.7

771.7

0.2

297.8

3%

127

2102

216

8.4

60.6

2.5

871.6

0.2

298.3

9%

127

395

202

7.0

759.8

2.4

169.3

0.2

398.1

5%

BE

ST

127

5101

215

7.5

160.3

2.5

370.3

0.1

998.2

1%

128

184

178

4.3

257.7

2.1

664.2

0.2

96.7

7%

128

284

178

4.3

257.7

2.1

664.2

0.2

396.7

7%

128

584

178

4.3

257.7

2.1

664.2

0.2

396.7

7%

128

684

178

4.3

257.7

2.1

664.2

0.2

396.7

7%

BE

ST

129

1115

245

868.3

2.9

979.3

0.2

98.3

3%

129

2119

253

8.7

766.7

2.9

178.4

0.1

998.7

%B

EST

129

4141

299

12.8

71.3

3.5

487.6

0.2

99.1

%129

5122

261

8.0

367.8

3.0

678.9

0.1

698.7

3%

131

1115

245

7.9

868.8

3.0

179.8

0.2

198.2

8%

131

2138

293

13.4

70

3.2

586.7

0.1

999.0

7%

131

5122

260

8.1

867.8

3.0

879

0.1

698.7

2%

BE

ST

137

1115

245

8.6

269.7

3.0

181.4

0.2

298.3

1%

137

2143

304

14.6

71.2

3.4

89.2

0.1

999.1

1%

137

5124

263

9.1

68.9

3.1

281.1

0.1

898.7

8%

BE

ST

139

1115

245

8.3

769.6

3.0

381

0.2

298.3

1%

BE

ST

139

2122

260

10.1

68.8

3.0

381.9

0.2

98.7

5%

139

5124

263

9.7

968.9

3.1

381.9

0.1

898.8

%149

1115

245

8.6

970.4

3.0

482.1

0.2

298.3

3%

BE

ST

149

2147

313

16.4

72.7

3.5

692.6

0.2

99.1

7%

149

5123

263

10.3

69.6

3.0

983.1

0.1

998.7

7%

151

1115

244

8.3

570.5

3.0

681.9

0.2

298.2

5%

BE

ST

151

2132

281

13.2

71.1

3.2

587.5

0.2

198.9

7%

151

5124

264

10.1

69.9

3.1

183.1

0.1

998.7

5%

157

1115

244

8.3

670.5

3.0

681.9

0.2

298.2

5%

157

2129

274

12.6

70.6

3.1

586.4

0.2

198.9

1%

157

5123

261

9.3

869.5

3.0

681.9

0.1

998.7

2%

BE

ST

163

1115

245

8.6

70.5

3.0

382.1

0.2

298.3

1%

BE

ST

163

2130

278

12

70.3

3.2

85.5

0.2

98.9

5%

163

5123

262

9.7

569.5

3.1

382.4

0.1

898.8

1%

167

1115

244

8.4

970.9

3.0

682.5

0.2

398.2

5%

BE

ST

167

2135

287

13.9

71.9

3.2

989.1

0.2

199%

167

5124

264

10.1

70.4

3.1

483.6

0.1

998.8

1%

173

1115

245

8.8

271.1

3.0

983

0.2

398.3

6%

BE

ST

173

2129

274

12.4

70.7

3.1

786.3

0.2

198.9

%C

onti

nued

on

next

page

77


Table

C.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s173

5124

263

10.4

70.4

3.1

183.9

0.1

998.7

7%

179

1115

244

8.6

871.1

3.0

782.9

0.2

398.2

5%

BE

ST

179

2134

286

14.8

72.7

3.3

290.8

0.2

298.9

8%

179

5124

264

10.4

70.8

3.1

584.4

0.1

998.8

1%

181

1115

245

8.9

271.5

3.0

983.5

0.2

398.3

6%

BE

ST

181

2134

286

14.3

72.4

3.3

190.1

0.2

199.0

1%

181

5123

262

10.2

70.9

3.1

284.2

0.2

98.7

5%

191

1114

243

7.8

270.6

3.1

181.6

0.2

298.1

8%

BE

ST

191

2123

262

10.9

69.9

3.1

183.9

0.2

198.8

1%

191

5123

261

9.1

769.8

3.1

182.1

0.1

898.7

5%

193

1115

246

8.5

770.9

3.0

582.5

0.2

298.3

1%

193

2116

247

9.2

769.2

2.9

181.3

0.2

298.5

9%

BE

ST

193

5122

260

8.8

669.7

3.1

181.7

0.1

898.7

5%

197

1115

245

8.8

571.3

3.0

583.2

0.2

398.3

1%

BE

ST

197

2134

285

14.1

72.4

3.2

889.7

0.2

199%

197

5122

260

10.2

70.4

3.0

883.7

0.1

998.7

7%

199

1114

243

8.5

671.4

3.0

683

0.2

398.2

1%

BE

ST

199

2139

295

14.4

72.3

3.3

690.1

0.2

99.0

7%

199

5123

261

9.8

670.7

3.1

383.7

0.1

998.7

5%

211

1115

244

8.7

471.5

3.0

883.3

0.2

398.2

5%

BE

ST

211

2144

306

16.8

74.1

3.4

994.4

0.2

299.1

3%

211

5122

259

10.6

70.5

3.0

484.2

0.2

98.7

%223

1114

243

8.0

871.1

3.1

282.3

0.2

298.1

8%

BE

ST

223

2125

267

12.3

71

3.1

386.4

0.2

298.8

6%

223

5123

262

9.6

570.3

3.1

383.1

0.1

998.7

8%

227

1114

243

8.7

71.6

3.0

783.3

0.2

398.2

1%

BE

ST

227

2125

266

12.6

71.8

3.1

487.5

0.2

398.8

4%

227

5121

257

9.8

170.6

3.0

883.5

0.1

998.7

%229

1115

244

8.8

171.8

3.0

983.7

0.2

398.2

5%

BE

ST

229

2140

297

16.2

74

3.4

693.6

0.2

299.1

%229

5121

258

10.1

70.6

3.0

983.8

0.2

98.7

2%

233

1115

244

8.7

371.3

3.1

83.1

0.2

398.2

5%

233

2116

246

9.6

769.6

2.9

382.2

0.2

298.5

5%

BE

ST

233

5121

258

9.5

70.5

3.1

283.1

0.1

998.7

2%

239

1114

243

8.2

171.4

3.1

382.8

0.2

398.1

8%

239

2117

248

9.7

170

2.9

682.6

0.2

298.5

5%

BE

ST

239

5121

257

9.3

470.4

3.0

782.8

0.1

998.7

%241

1114

243

8.7

371.5

3.0

783.3

0.2

398.2

1%

Conti

nued

on

next

page

78


Table

C.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s241

2115

245

9.5

769.8

2.9

82.2

0.2

298.5

3%

BE

ST

241

5119

253

8.9

70.6

3.0

582.6

0.1

998.6

3%

251

1114

243

8.2

771.4

3.1

482.8

0.2

398.1

8%

251

2115

246

9.5

969.7

2.9

182.2

0.2

398.4

6%

251

5118

252

9.1

570

3.0

182.2

0.1

998.5

7%

BE

ST

255

1114

242

7.8

970.9

3.1

482

0.2

298.0

8%

255

2115

245

9.5

569.5

2.8

981.9

0.2

298.5

1%

255

3109

232

8.2

268.4

2.7

679.3

0.2

398.4

8%

BE

ST

255

5116

246

8.3

69

2.9

180.2

0.1

998.4

8%

256

195

203

5.0

266.1

2.4

573.6

0.2

97.1

4%

256

295

203

5.0

266.1

2.4

573.6

0.2

397.1

4%

256

595

203

5.0

266.1

2.4

573.6

0.2

397.1

4%

256

695

203

5.0

266.1

2.4

573.6

0.2

397.1

4%

BE

ST

257

1130

276

9.2

477.7

3.3

790.4

0.2

198.5

3%

257

2136

289

10.4

76.1

3.2

989.8

0.1

898.8

9%

257

4158

336

14.5

80.9

3.9

899.4

0.2

199.2

%257

5136

290

8.6

976.4

3.4

588.5

0.1

698.8

4%

BE

ST

509

1128

272

9.2

79.9

3.5

292.6

0.2

298.3

6%

509

2130

277

10.3

77.8

3.2

491.3

0.2

198.7

%509

5131

279

9.7

378.1

3.3

291.1

0.1

998.7

3%

BE

ST

511

1127

271

8.8

480

3.5

392.4

0.2

298.2

8%

511

2130

277

10.8

78.2

3.2

992.3

0.2

298.7

2%

511

3124

264

9.2

177.1

3.1

389.4

0.2

398.7

%B

EST

511

5130

276

9.6

377.7

3.2

490.5

0.1

998.6

3%

512

1107

228

5.7

574.5

2.7

483

0.2

197.4

4%

512

2107

228

5.7

574.5

2.7

583

0.2

397.4

4%

512

5107

228

5.7

574.5

2.7

583

0.2

497.4

4%

512

6107

228

5.7

574.5

2.7

583

0.2

497.4

4%

BE

ST

513

1144

305

10.2

86.1

3.7

1100

0.2

198.6

7%

513

2152

323

11.4

84.5

3.7

399.6

0.1

899.0

3%

513

4177

376

16.3

90.3

4.4

7111

0.2

199.2

8%

513

5152

322

9.9

185.3

3.8

199

0.1

798.9

6%

BE

ST

1021

1142

302

10.3

89.2

3.9

1103

0.2

398.5

1%

1021

2162

344

15.7

88.9

3.8

9109

0.2

99.1

3%

1021

5146

311

11.3

87.1

3.8

1102

0.2

96.4

3%

BE

ST

1023

1141

300

9.8

988.9

3.9

1103

0.2

298.4

4%

1023

2152

322

13.2

87.6

3.7

7105

0.2

199.0

1%

1023

3137

291

10.3

86

3.4

699.8

0.2

398.7

7%

BE

ST

Conti

nued

on

next

page

79


Table

C.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s1023

5146

310

11

86.4

3.6

1101

0.1

998.7

7%

1024

1119

252

6.4

682.8

3.0

492.3

0.2

197.6

7%

1024

2135

287

9.6

384.9

3.3

897.9

0.2

298.6

3%

1024

5119

252

6.4

682.8

3.0

492.3

0.2

497.6

7%

1024

6119

252

6.4

682.8

3.0

492.3

0.2

497.6

7%

BE

ST

1025

1158

337

11.4

95.6

4.1

5111

0.2

198.8

%1025

2169

359

12.6

93.4

4.1

2110

0.1

899.1

4%

1025

4193

411

17.7

99.7

4.9

1122

0.2

199.3

4%

1025

5167

356

10.9

93.8

4.2

2109

0.1

699.0

7%

BE

ST

2039

1156

332

11.2

98.2

4.3

3114

0.2

398.6

3%

BE

ST

2039

2199

424

21.2

101

4.8

4127

0.2

99.3

9%

2039

5165

350

13.2

96.9

4.4

9115

0.1

996.8

4%

2047

1155

331

10.8

97.7

4.3

3113

0.2

298.5

7%

2047

2165

351

14

95.9

4.1

2114

0.2

199.0

7%

2047

3150

319

11.4

94.4

3.8

110

0.2

398.8

6%

BE

ST

2047

5160

340

12

94.8

3.9

5111

0.1

998.9

1%

2048

1130

277

7.1

691.1

3.3

4102

0.2

197.8

7%

2048

2149

317

9.8

194.3

3.6

9108

0.2

298.6

5%

2048

5130

277

7.1

791.1

3.3

4102

0.2

497.8

7%

2048

6130

277

7.1

791.1

3.3

4102

0.2

497.8

7%

BE

ST

2049

1173

367

12.6

104

4.4

8121

0.2

198.9

1%

BE

ST

2049

2191

406

15.2

103

4.6

5123

0.1

899.2

8%

2049

4211

450

19.7

109

5.3

4134

0.2

199.4

%2049

5214

456

19.3

107

5.3

2132

0.1

799.4

%4093

1170

362

12.3

107

4.7

3124

0.2

398.7

5%

BE

ST

4093

2210

447

21.4

109

5.0

1135

0.1

999.4

1%

4093

5177

377

13.7

105

6.1

3125

0.1

989.6

2%

4095

1170

361

11.9

107

4.7

3123

0.2

298.7

%4095

2195

416

18.2

106

4.7

5129

0.1

999.3

2%

4095

3163

347

12.6

103

4.1

1119

0.2

398.9

9%

BE

ST

4095

5176

374

13.2

104

5.9

4123

0.1

990.1

%4096

1142

301

7.8

899.5

3.6

3111

0.2

198.0

4%

4096

2162

345

11.6

102

4.0

3118

0.2

298.8

8%

4096

5142

301

7.8

899.5

3.6

3111

0.2

498.0

4%

4096

6142

301

7.8

899.5

3.6

3111

0.2

498.0

4%

BE

ST

4097

1187

399

13.7

113

4.8

7132

0.2

299.0

1%

4097

2200

425

15.3

110

4.8

8131

0.1

899.2

8%

BE

ST

4097

4229

488

21.1

118

5.8

3145

0.2

199.4

6%

Conti

nued

on

next

page

80


Table

C.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s4097

5234

497

21.1

117

5.7

8143

0.1

799.4

7%

8179

1184

392

13.7

117

5.0

6135

0.2

398.8

5%

BE

ST

8179

2225

480

22.7

118

5.4

7146

0.2

99.4

4%

8179

5247

525

22

128

7.1

1157

0.1

899.4

5%

8191

1184

391

12.9

115

5.1

2133

0.2

298.8

1%

8191

2204

433

18.3

114

4.9

6137

0.2

99.3

%8191

3178

378

13.6

111

4.4

8129

0.2

399.0

9%

BE

ST

8191

5194

414

15.2

113

6.0

6134

0.1

994.6

9%

8192

1153

326

8.5

9108

3.9

3120

0.2

198.1

8%

8192

2176

374

12

111

4.3

6127

0.2

198.9

1%

8192

5153

326

8.5

9108

3.9

3120

0.2

498.1

8%

8192

6153

326

8.5

9108

3.9

3120

0.2

498.1

8%

BE

ST

8193

1201

428

14.7

121

5.2

1141

0.2

199.0

7%

BE

ST

8193

27325

15585

196

273

213

682

0.0

26

99.8

4%

8193

4248

527

22.9

128

6.3

8157

0.2

98.4

9%

8193

5251

533

22.3

125

6.1

6154

0.1

799.4

9%

16381

1198

421

14.4

125

5.5

144

0.2

398.9

2%

BE

ST

16381

2243

517

25.1

126

5.7

157

0.1

999.4

7%

16381

5257

546

22.3

134

7.3

4163

0.1

899.4

2%

16383

1197

420

13.8

124

5.5

143

0.2

298.8

9%

16383

2228

486

20.7

123

5.4

149

0.1

899.4

%16383

3192

409

14.8

120

4.8

6140

0.2

399.1

5%

BE

ST

16383

5256

544

21.8

133

7.3

162

0.1

799.4

3%

16384

1165

351

9.2

8116

4.2

2130

0.2

198.3

1%

16384

2189

402

13.2

120

4.6

8137

0.2

299.0

3%

16384

5165

351

9.2

8116

4.2

2130

0.2

498.3

1%

16384

6165

351

9.2

8116

4.2

2130

0.2

498.3

1%

BE

ST

16385

1216

459

15.8

131

5.6

152

0.2

299.1

4%

BE

ST

16385

215576

33141

315

386

455

1160

0.0

21

99.8

7%

16385

4266

567

24.3

137

7.0

7169

0.2

198.0

9%

16385

5273

582

24.4

136

6.8

8167

0.1

799.5

5%

81

Appendix D

RNS multiplier results

The results have been generated using uniformly distributed random inputoperands and a setup as described in figure 5.6b on page 41. Below is aDescription of the titles in the header row of table D.1.

Modulo The modulo

Type Results for the specific multiplier type, see table 3.2 on page 13 fordetails.

Area The total area of the multiplier including the registers, in µm2.

Gates The total gate count including registers.

Switch power The switching power. See section 2.3.3 on page 10 for de-tails about the different power values in µW.

Int. power The internal power in µW.

Leak power The leakage power in µW.

Total power The total power in µW.

Toggle rate The average toggle rate per net per clock cycle.

UVT cells The percentage of UVT cells. See section 2.3.3 on page 10 fordetails.

82

APPENDIX D. RNS MULTIPLIER RESULTS

Tab

leD

.1:

Res

ult

sfo

rR

NS

mu

ltip

lier

s

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s2

714

29

0.1

13

7.5

90.3

48.0

40.2

85.7

1B

EST

20

14

29

0.1

13

7.5

90.3

48.0

40.2

85.7

12

114

29

0.1

13

7.5

90.3

48.0

40.2

85.7

13

025

52

0.4

06

14.9

0.5

75

15.9

0.1

990.9

1B

EST

31

27

58

1.0

215.2

0.6

18

16.8

0.1

894.1

23

228

59

1.2

15.2

0.6

31

17

0.1

894.4

43

628

60

1.0

515.2

0.6

31

16.9

0.1

794.1

24

725

53

0.6

02

15.4

0.6

11

16.6

0.2

92.3

1B

EST

40

25

53

0.6

15.4

0.6

11

16.6

0.2

92.3

14

125

53

0.6

02

15.4

0.6

11

16.6

0.2

92.3

15

043

92

2.8

423.9

127.7

0.2

96.7

7B

EST

51

66

140

4.8

626.9

1.6

933.4

0.1

398.1

55

669

148

4.8

625.8

1.5

932.2

0.1

298.5

57

046

99

3.4

824.9

1.0

729.5

0.2

97.0

6B

EST

71

57

122

4.6

626.7

1.4

32.8

0.1

797.8

77

251

109

3.8

926

1.2

731.1

0.2

97.3

74

51

108

4.0

726.5

1.2

231.8

0.2

197.4

47

667

143

5.6

627.7

1.5

934.9

0.1

598.4

18

040

86

2.0

624.1

0.9

94

27.2

0.2

196.3

BE

ST

81

38

81

1.4

323.6

0.9

17

25.9

0.2

95.2

48

738

81

1.4

323.6

0.9

17

25.9

0.2

95.2

49

084

179

7.7

134.4

1.9

144

0.1

698.8

5B

EST

91

103

218

8.6

838.2

2.6

849.5

0.1

598.8

99

6105

224

7.8

38.3

3.1

949.3

0.1

295.7

11

474

158

7.7

836.5

1.7

746

0.2

198.4

1B

EST

11

091

193

8.9

735.7

2.1

346.8

0.1

599.0

111

1106

225

9.7

140.3

2.7

552.8

0.1

698.9

111

6118

251

10.6

42.5

4.5

157.6

0.1

496.2

613

478

165

9.4

237.4

1.8

448.6

0.2

298.5

5B

EST

13

0135

286

13.2

37.7

3.0

153.9

0.1

199.4

413

1101

215

10.4

43.1

2.8

856.4

0.1

798.5

913

6113

239

10.8

43.4

3.2

257.4

0.1

697.9

415

287

185

9.4

638.4

2.1

450

0.1

898.6

3B

EST

15

0102

218

14

39.7

2.3

256

0.1

999.1

15

191

194

8.3

541.3

2.5

752.2

0.1

898.4

615

5680

1448

95.7

144

50.1

290

0.1

383.0

1C

onti

nued

on

next

page

83


Table

D.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s15

6112

238

11.2

44.4

3.4

359

0.1

594.9

16

756

120

2.8

633.3

1.4

837.6

0.1

896.8

8B

EST

16

073

156

8.2

35.5

1.7

445.5

0.2

198.5

716

156

120

2.8

633.3

1.4

837.6

0.1

896.8

817

1137

292

13.5

51.7

3.7

368.9

0.1

699.1

4B

EST

17

0187

397

22.1

53.8

4.6

80.5

0.1

599.6

17

6288

613

38.4

85.6

10.3

134

0.1

897.0

119

4110

234

13.5

47.6

2.5

863.7

0.1

999.0

6B

EST

19

0206

439

20.5

52.2

5.0

377.7

0.1

299.6

519

1189

402

20.4

60.3

5.8

986.6

0.1

598.4

19

6261

556

30.7

72.6

8.6

5112

0.1

597.3

823

4121

258

17.2

50.3

2.8

370.3

0.2

99.1

9B

EST

23

0228

485

22.4

54.5

6.0

182.9

0.1

299.6

923

1167

356

20.3

57.7

4.4

282.5

0.1

799.3

923

6281

598

39.7

84.4

10.2

134

0.1

897.2

129

4138

293

20.5

52.5

3.1

876.1

0.2

99.3

4B

EST

29

0377

803

55

78

9142

0.1

599.8

229

1146

310

18.1

62.2

4.6

784.9

0.1

898.0

429

6289

615

42.4

89.7

9.9

3142

0.1

997.6

931

1127

270

14.1

55.9

3.6

473.6

0.1

998.9

BE

ST

31

0240

511

38.6

64.4

5.5

6109

0.1

799.7

31

2163

348

21.4

61.3

4.9

487.6

0.1

898.7

731

4141

300

22.6

54.3

3.2

780.2

0.2

199.3

631

51099

2338

102

147

145

394

0.0

855.9

631

6259

552

35.3

78.8

8.8

2123

0.1

797.4

232

776

162

4.7

443.8

2.0

850.6

0.1

997.7

3B

EST

32

0163

346

27.4

56.5

3.7

787.6

0.2

99.5

132

176

162

4.7

443.8

2.0

850.6

0.1

997.7

333

1186

397

21.3

67.5

5.3

994.2

0.1

698.6

8B

EST

33

0404

861

33.8

72.2

10.9

117

0.0

95

99.8

433

6465

990

67.2

124

23.4

214

0.1

691.4

937

4201

427

25.7

63.2

4.6

893.6

0.1

599.6

BE

ST

37

1297

631

37.2

81.6

10.3

129

0.1

595.6

937

6432

919

59.8

116

21.7

197

0.1

692.8

941

4201

427

27.7

64.5

4.8

897.1

0.1

699.6

BE

ST

41

1264

563

34

77.7

7.2

6119

0.1

699.3

641

6435

926

63.6

122

22.5

208

0.1

689.8

643

4217

462

31.6

67.1

5.1

4104

0.1

699.6

3B

EST

Conti

nued

on

next

page

84


Table

D.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s43

1261

555

36.4

83.3

8.4

9128

0.1

797.4

543

6420

895

63.8

119

19.4

202

0.1

793.0

447

4236

503

34.5

69.2

5.7

4110

0.1

699.6

7B

EST

47

1248

528

34.4

83

8.9

2126

0.1

896.8

47

6425

905

65.2

120

21

206

0.1

791.2

353

4239

508

36.6

70.1

5.6

8112

0.1

699.6

7B

EST

53

1273

581

37.7

89.5

9.8

1137

0.1

796.8

953

6430

914

69.1

126

18.9

214

0.1

894.9

359

4236

503

41.1

75.2

5.5

5122

0.1

999.6

7B

EST

59

1241

512

33

84.7

7.7

1125

0.1

898.7

59

6406

864

65.3

120

18.2

203

0.1

893.0

661

4237

504

41.8

76

5.5

1123

0.1

999.6

6B

EST

61

1230

490

32.2

87.3

7.7

6127

0.1

897.9

661

6417

888

67.8

123

20

211

0.1

891.1

63

1168

358

20.8

72

5.5

98.4

0.1

897.3

5B

EST

63

2285

607

40.9

89.6

9.2

6140

0.1

998.0

663

51346

2863

234

304

226

765

0.1

439.3

63

6417

887

67.2

124

19.5

210

0.1

893.2

164

199

210

6.9

254.3

2.7

964

0.1

998.2

867

1444

945

61.8

113

15

190

0.1

597.4

5B

EST

67

4308

654

41.7

81.3

8.6

5132

0.1

498.7

867

6649

1382

117

184

54.8

356

0.1

878.3

371

1458

975

67.2

116

13.9

197

0.1

698.1

4B

EST

71

4349

743

49.7

87.5

10.3

147

0.1

398.9

171

6654

1391

115

183

56.7

355

0.1

975.7

273

1376

800

51.5

108

12.2

172

0.1

697.2

9B

EST

73

4315

670

43.9

80.5

9.1

2134

0.1

397.6

173

6689

1467

123

192

46.4

361

0.1

985.2

879

4382

812

55

92.6

10.6

158

0.1

499.0

4B

EST

79

1451

960

64.8

122

16.3

203

0.1

696.6

379

6680

1447

124

196

61.3

381

0.1

978.2

183

4438

933

63.9

97.2

14.2

175

0.1

396.7

BE

ST

83

1471

1003

72.7

124

16.1

213

0.1

697.4

383

6656

1395

116

191

61.9

368

0.1

975.9

489

4425

903

64.4

95.5

13.7

174

0.1

396.4

2B

EST

89

1396

842

59.7

113

13.2

186

0.1

797.9

389

6566

1205

106

163

49.6

318

0.1

875.8

497

1397

844

58.8

115

14.4

188

0.1

797.0

3B

EST

Conti

nued

on

next

page

85


Table

D.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s97

4407

865

53.6

86.4

9.9

9150

0.1

199.3

197

6630

1340

114

189

49

352

0.1

979.7

9101

4488

1038

64.8

102

15.2

182

0.1

197.6

3B

EST

101

1414

880

61.5

119

17.1

197

0.1

695.6

3101

6584

1242

115

168

52.1

334

0.1

875.7

3103

4497

1058

67.1

101

13.7

182

0.1

298.8

1B

EST

103

1391

831

60.4

122

15.5

198

0.1

896.2

5103

6643

1367

121

196

43.2

360

0.2

85.2

6107

4561

1193

73.5

107

15.7

197

0.1

198.6

1B

EST

107

1424

902

70.4

127

20.6

218

0.1

892.5

6107

6684

1456

134

207

63.1

405

0.2

75.1

1109

4536

1140

77.4

111

16.5

205

0.1

297.9

7B

EST

109

1446

949

72.7

131

16.4

220

0.1

798.1

7109

6655

1394

131

204

48.2

383

0.2

82.0

1113

1333

707

48.3

110

11.6

169

0.1

897.9

7B

EST

113

4465

990

61.5

95.1

14.6

171

0.1

297.5

9113

6611

1299

116

185

44.8

346

0.2

83.5

7127

1246

523

34.2

92

7.4

134

0.1

999.0

9B

EST

127

2567

1207

100

160

25.3

286

0.1

994.9

7127

4458

974

83.3

114

13.1

210

0.1

798.2

7127

52066

4396

230

294

425

950

0.0

85

17.6

9127

6594

1263

118

177

60.9

356

0.1

969.6

1128

7124

264

9.9

665.8

3.5

279.3

0.1

998.6

7B

EST

128

1124

264

9.9

665.8

3.5

279.3

0.1

998.6

7129

1336

716

42.5

106

11.9

160

0.1

795.1

7B

EST

129

6915

1946

151

236

118

506

0.1

659.8

8131

4575

1223

64.6

107

19.5

191

0.0

92

97.4

5B

EST

131

1583

1241

82.4

138

18.5

238

0.1

697.0

1131

6924

1966

159

243

127

529

0.1

658.7

5137

1663

1410

101

154

22.9

277

0.1

696.4

6B

EST

137

6945

2010

168

247

126

541

0.1

757.2

7139

4615

1309

65.7

111

17.8

195

0.0

91

99.0

8B

EST

139

1740

1574

116

173

24.7

313

0.1

696.6

6139

6923

1964

156

247

123

526

0.1

760.0

2149

4601

1278

61.8

105

20.4

187

0.0

88

97.1

3B

EST

149

1673

1433

103

163

25.4

291

0.1

696.5

3149

6876

1864

169

237

126

532

0.1

747.5

2151

4652

1386

66.1

112

17.9

196

0.0

999.2

9B

EST

Conti

nued

on

next

page

86


Table

D.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s151

1693

1474

109

166

22.3

297

0.1

697.8

3151

6820

1745

155

229

107

492

0.1

754.7

5157

4642

1366

65.1

107

21.5

194

0.0

88

97.0

2B

EST

157

1718

1527

117

173

24.5

314

0.1

697.3

7157

6933

1986

167

251

123

541

0.1

757.4

9163

4714

1518

76.4

118

24.3

219

0.0

93

97.0

6B

EST

163

1709

1509

114

172

24.3

309

0.1

697.5

2163

6943

2007

170

258

121

549

0.1

761.6

6167

4709

1508

73

119

20.8

213

0.0

93

98.2

6B

EST

167

1772

1642

119

174

25.9

319

0.1

597.2

1167

6893

1899

154

242

118

515

0.1

757.4

173

4714

1519

72.5

118

20.4

210

0.0

85

99.0

7B

EST

173

1767

1631

120

177

27.6

324

0.1

696.3

6173

6868

1847

172

233

122

527

0.1

848.3

2179

4711

1514

72

114

24.5

210

0.0

85

96.8

3B

EST

179

1725

1542

117

176

25

318

0.1

797.3

2179

6894

1901

167

248

117

532

0.1

857.8

2181

4742

1580

75

125

22.7

223

0.0

85

98.9

1B

EST

181

1731

1556

117

176

25.2

318

0.1

697.3

3181

6859

1828

151

235

108

494

0.1

860.7

8191

4771

1640

77.9

118

23.8

220

0.0

85

97.8

7B

EST

191

1494

1052

75.7

139

17.4

232

0.1

896.1

2191

6916

1950

160

244

120

524

0.1

758.5

8193

4694

1477

66.4

105

19.7

191

0.0

898.5

8B

EST

193

1525

1117

82.7

150

18

251

0.1

798.2

4193

6970

2065

181

273

136

590

0.1

854.5

9197

4792

1686

76.5

119

24.5

220

0.0

81

98.2

6B

EST

197

1582

1237

93.8

151

23.1

268

0.1

794.9

7197

6863

1837

158

242

116

516

0.1

857.2

2199

4806

1714

78.5

125

23.5

227

0.0

84

99.1

8B

EST

199

6890

1894

165

247

123

534

0.1

858.4

211

4832

1769

84.5

125

22.3

232

0.0

83

99.5

2B

EST

211

1591

1257

92.9

155

20.1

268

0.1

697.7

9211

6839

1786

180

242

117

539

0.1

948.9

8223

4878

1868

87.3

129

30.1

247

0.0

83

97.1

3B

EST

223

1533

1133

83.3

145

18.6

247

0.1

796.8

3223

6938

1996

169

261

130

560

0.1

854.9

2227

1469

997

72.5

144

16.9

234

0.1

896.7

6B

EST

Conti

nued

on

next

page

87


Table

D.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s227

4909

1934

80.9

128

31.3

240

0.0

73

97.4

227

6828

1763

170

240

117

527

0.1

945.5

229

4873

1857

81.6

126

28.7

236

0.0

78

97.3

5B

EST

229

1514

1093

83.4

150

17.5

251

0.1

897.1

2229

6822

1750

171

235

115

520

0.1

947.3

4233

1510

1085

84.9

150

17.5

252

0.1

896.7

BE

ST

233

4856

1820

142

169

24.5

335

0.1

597.9

8233

6911

1937

174

263

123

560

0.1

857.1

9239

1431

916

70.7

133

14.4

218

0.1

897.5

4B

EST

239

4880

1872

140

175

26.9

342

0.1

598.0

1239

6927

1973

179

266

128

573

0.1

856.3

6241

1389

827

60

132

14.2

206

0.1

997.9

1B

EST

241

4910

1936

152

178

24.5

354

0.1

598.9

8241

6835

1777

178

239

115

532

0.1

947.2

8251

1374

795

60.1

131

13.1

205

0.1

997.5

1B

EST

251

4976

2078

154

184

27.8

366

0.1

499.0

2251

6854

1817

162

245

102

509

0.1

961.5

3255

1315

670

45.2

114

10.4

169

0.1

997.3

2B

EST

255

2999

2126

201

260

111

573

0.1

865.7

9255

52599

5531

519

606

559

1680

0.1

510.9

1255

6938

1996

174

267

117

558

0.1

859.6

6256

7153

325

14.1

77.8

4.3

496.2

0.1

998.9

9B

EST

256

1153

325

14.1

77.8

4.3

496.2

0.1

998.9

9257

1407

866

54.1

125

12.5

192

0.1

798.4

1B

EST

257

61242

2642

218

327

197

742

0.1

645.8

509

1429

913

68.6

146

14.1

229

0.1

997.8

6B

EST

509

61132

2409

215

324

171

710

0.1

849.5

511

1397

845

59.1

135

12.6

207

0.1

998.7

3B

EST

511

22001

4257

434

482

275

1190

0.1

848.3

1511

53241

6896

358

450

667

1480

0.0

83

9.8

6511

61131

2406

211

308

163

681

0.1

750.9

3512

7183

389

19.4

89.2

5.0

2114

0.1

999.2

BE

ST

512

1183

389

19.4

89.2

5.0

2114

0.1

999.2

513

1498

1060

71.4

152

15.5

239

0.1

798.4

9B

EST

513

61738

3698

348

480

325

1150

0.1

831.6

81021

1536

1140

89.4

180

18.5

288

0.1

997.7

7B

EST

1021

61693

3602

376

510

308

1190

0.1

932.5

21023

1467

993

73.4

156

15.8

245

0.1

997.4

9B

EST

Conti

nued

on

next

page

88


Table

D.1

–conti

nued

from

pre

vio

us

page

Modulo

Typ

eA

rea

Gates

Sw

itch

pow

er

Int.

pow

er

Leak

pow

er

Total

pow

er

Toggle

rate

UV

Tcell

s1023

22919

6210

715

733

582

2030

0.1

812.0

81023

53803

8090

788

894

779

2460

0.1

614.4

11023

61713

3644

368

514

326

1210

0.1

930.5

61024

7216

460

24.6

102

5.9

6133

0.1

999.3

3B

EST

1024

1216

460

24.6

102

5.9

6133

0.1

999.3

31025

1614

1306

92.8

181

21.3

295

0.1

797.0

9B

EST

1025

62108

4486

454

607

432

1490

0.1

818.9

2039

1746

1587

134

250

28.5

412

0.2

197.6

BE

ST

2039

61939

4125

445

587

393

1430

0.1

920.0

42047

1550

1170

88

180

17.5

285

0.1

998.1

6B

EST

2047

23729

7935

917

905

764

2590

0.1

89.8

82047

54192

8920

453

542

831

1830

0.0

81

13.5

12047

62031

4322

454

597

406

1460

0.1

919.2

12048

7252

535

31

117

6.9

3155

0.1

999.4

3B

EST

2048

1252

535

31

117

6.9

3155

0.1

999.4

32049

1735

1563

119

230

27.6

377

0.1

996.3

4B

EST

2049

62949

6274

640

793

639

2070

0.1

79.2

84096

7289

616

36.5

131

8.1

1176

0.2

99.5

BE

ST

4096

1289

616

36.5

131

8.1

1176

0.2

99.5

89


90

Avdelning, InstitutionDivision, Department

DatumDate

Sprak

Language

� Svenska/Swedish

� Engelska/English

�

RapporttypReport category

� Licentiatavhandling

� Examensarbete

� C-uppsats

� D-uppsats

� Ovrig rapport

�

URL for elektronisk version

ISBN

ISRN

Serietitel och serienummerTitle of series, numbering

ISSN

Linkoping Studies in Science and Technology

Thesis No. 4792

TitelTitle

ForfattareAuthor

SammanfattningAbstract

NyckelordKeywords

Power dissipation has become one of the major limiting factors in the design ofdigital ASICs. Low power dissipation will increase the mobility of the ASIC byreducing the system cost, size and weight. DSP blocks are a major source of powerdissipation in modern ASICs. The residue number system (RNS) has, for a long time,been proposed as an alternative to the regular two’s complement number system(TCS) in DSP applications to reduce the power dissipation. The basic concept ofRNS is to first encode the input data into several smaller independent residues. Thecomputational operations are then performed in parallel and the results are eventuallydecoded back to the original number system. Due to the inherent parallelism of theresidue arithmetics, hardware implementation results in multiple smaller design units.Therefore an RNS design requires low leakage power cells and will result in a lowerswitching activity.

The residue number system has been analyzed by first investigating different imple-mentations of RNS adders and multipliers (which are the basic arithmetic functionsin a DSP system) and then deriving an optimal combination of these. The opti-mum combinations have been used to implement an FIR filter in RNS that has beencompared with a TCS FIR filter.

By providing different input data and coefficients to both the RNS and TCS FIRfilter an evaluation of their respective performance in terms of area, power and oper-ating frequency have been performed. The result is promising for uniform distributedrandom input data with approximately 15 % reduction of average power with RNScompared to TCS. For a realistic DSP application with normally distributed inputdata, the power reduction is negligible for practical purposes.

Division of Electronics Systems,Department of Electrical Engineering581 83 Linkoping

2014-08-25

-

LiTH-ISY-EX--14/4792--SE

-

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-110176

2014-08-25


Viktor Classon

××

residue number system, RNS, low power, ASIC

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-110176