[ieee comput. soc 12th international workshop on rapid system protyping. rsp 2001 - monterey, ca,...

6
EUDOXUS: A WWW-based Generator of Reusable Arithmetic Cores D. Bakalis’,2, K. D. Adaodf2, D. Lymperopoulos’, G. Ph. Alexiou’82 and D. Nikolos’.2 Computer Engineering and Informatics Dept., University of Patras, 26 500, Patras, Greece Computer Technology Institute, 3, Kolokotroni Str., 26 261 Patras, Greece e-mails: [email protected], {adaos, lumperop}@ceid.upatras.gr, {alexiou, nikolosd} @cti.gr I 2 Abstract In this paper we present a www-based tool for generation of arithmetic soji cores for a wide variety of functions, operand sizes and architectures. The tool produces structural and synthesizable VHDL and/or Verilog descriptions and covers several arithmetic operations such as addition, subtraction, multiplication, division, squaring, square rooting and shifting. Therefore, designs requiring arithmetic cores, as for example those in digital signal processing and multimedia applications, can be completed faster and with less effort. 1. Introduction Recent advances in silicon technology allow building of complex chips consisting of millions of transistors. These advances in technology promise new levels of system integration on a single chip but also present significant challenges to the chip designer. System-on-a- Chip (SoC) design challenges engineers with a large scale task to integrate and provide a rich set of features, often with high performance, while achieving time-to-market goals on progressively shorter product life cycles. Design automation has increased productivity and reduced cycle time, while a reuse strategy has become a means of amortizing the development effort and investment across multiple projects [ 1,2]. To cope with these challenges, many design teams are turning to a block-based approach that emphasizes design reuse. The use of pre-verified and pretested cores leads to reduced time-to-market through faster design cycles and higher productivity of the design teams. Moreover it minimizes risk and enables cost-effective solutions. Reusable cores can be classified as hard cores and soft cores. Hard cores are firm design blocks already targeted to a specific implementation technology. These cores are fully designed, mapped and routed but their reusability is restricted since they can not be used in designs mapped to a different implementation technology. Soft cores that are provided as synthesizable HDL descriptions provide the designer with a greater freedom since they can be mapped to any ASIC or FPGA implementation technology. However, in order to be effective, this block-based design methodology requires access to an extensive library of reusable cores or equivalently to a versatile core generator. As the amount of information required to complete a design increased, the Internet has emerged as the preferred tool for the delivery of information. Network- based applications and design tools are becoming significantly more popular than their stand-alone desktop counterparts [3, 41. The advantages of this technology compared with conventional workstation-hosted design techniques are many: First of all, design tools are now developed for a single platform resulting in reduced cost. Furthermore, upgrades and bug fixes are immediately available to users since they no longer have to wait for the next release. Installation costs are also minimized. Moreover, better access control can be achieved. Access can be restricted only to authorized users and unauthorized copies can not be made since the tools reside on the network server. Finally, performance- intensive design tools can now run on powerful network servers, yet they can be accessed from lower-performance low-cost desktop computers [5]. Several attempts for producing network-based design tools and applications have been reported during the last years [5-91. Arithmetic modules are common to almost every system design. Algorithms for arithmetic operations coded in a technology independent HDL and synthesizable in a variety of commercial products, offer the designer with efficient and versatile building blocks for developing complex System-on-a-Chip solutions. Most available synthesis tools offer some kind of block generators for arithmetic modules. Common limitation is the lack of a variety of architectures for all possible implementation technologies. Several arithmetic module generators have been presented during the last years. In [9] an on-line 1074-6005/01 $10.00 0 2001 IEEE 182

Upload: d

Post on 10-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE Comput. Soc 12th International Workshop on Rapid System Protyping. RSP 2001 - Monterey, CA, USA (25-27 June 2001)] Proceedings 12th International Workshop on Rapid System Prototyping

EUDOXUS: A WWW-based Generator of Reusable Arithmetic Cores

D. Bakalis’,2, K. D. Adaodf2, D. Lymperopoulos’, G. Ph. Alexiou’82 and D. Nikolos’.2

Computer Engineering and Informatics Dept., University of Patras, 26 500, Patras, Greece Computer Technology Institute, 3, Kolokotroni Str., 26 261 Patras, Greece

e-mails: [email protected], {adaos, lumperop} @ceid.upatras.gr, {alexiou, nikolosd} @cti.gr

I

2

Abstract In this paper we present a www-based tool for

generation of arithmetic soji cores for a wide variety of functions, operand sizes and architectures. The tool produces structural and synthesizable VHDL and/or Verilog descriptions and covers several arithmetic operations such as addition, subtraction, multiplication, division, squaring, square rooting and shifting. Therefore, designs requiring arithmetic cores, as for example those in digital signal processing and multimedia applications, can be completed faster and with less effort.

1. Introduction

Recent advances in silicon technology allow building of complex chips consisting of millions of transistors. These advances in technology promise new levels of system integration on a single chip but also present significant challenges to the chip designer. System-on-a- Chip (SoC) design challenges engineers with a large scale task to integrate and provide a rich set of features, often with high performance, while achieving time-to-market goals on progressively shorter product life cycles. Design automation has increased productivity and reduced cycle time, while a reuse strategy has become a means of amortizing the development effort and investment across multiple projects [ 1,2].

To cope with these challenges, many design teams are turning to a block-based approach that emphasizes design reuse. The use of pre-verified and pretested cores leads to reduced time-to-market through faster design cycles and higher productivity of the design teams. Moreover it minimizes risk and enables cost-effective solutions. Reusable cores can be classified as hard cores and soft cores. Hard cores are firm design blocks already targeted to a specific implementation technology. These cores are fully designed, mapped and routed but their reusability is restricted since they can not be used in designs mapped to a different implementation technology. Soft cores that are

provided as synthesizable HDL descriptions provide the designer with a greater freedom since they can be mapped to any ASIC or FPGA implementation technology. However, in order to be effective, this block-based design methodology requires access to an extensive library of reusable cores or equivalently to a versatile core generator.

As the amount of information required to complete a design increased, the Internet has emerged as the preferred tool for the delivery of information. Network- based applications and design tools are becoming significantly more popular than their stand-alone desktop counterparts [ 3 , 41. The advantages of this technology compared with conventional workstation-hosted design techniques are many: First of all, design tools are now developed for a single platform resulting in reduced cost. Furthermore, upgrades and bug fixes are immediately available to users since they no longer have to wait for the next release. Installation costs are also minimized. Moreover, better access control can be achieved. Access can be restricted only to authorized users and unauthorized copies can not be made since the tools reside on the network server. Finally, performance- intensive design tools can now run on powerful network servers, yet they can be accessed from lower-performance low-cost desktop computers [ 5 ] . Several attempts for producing network-based design tools and applications have been reported during the last years [5-91.

Arithmetic modules are common to almost every system design. Algorithms for arithmetic operations coded in a technology independent HDL and synthesizable in a variety of commercial products, offer the designer with efficient and versatile building blocks for developing complex System-on-a-Chip solutions. Most available synthesis tools offer some kind of block generators for arithmetic modules. Common limitation is the lack of a variety of architectures for all possible implementation technologies.

Several arithmetic module generators have been presented during the last years. In [9] an on-line

1074-6005/01 $10.00 0 2001 IEEE 182

Page 2: [IEEE Comput. Soc 12th International Workshop on Rapid System Protyping. RSP 2001 - Monterey, CA, USA (25-27 June 2001)] Proceedings 12th International Workshop on Rapid System Prototyping

arithmetic design tool for FPGAs is presented. The tool was developed in Java programming language and generates standard VHDL code. However, it is restricted to on-line arithmetic, that is, all operations are performed digit serially, most significant bit first. In [ lo] a module generator capable of generating complex arithmetic structures is presented. Rather than using distinct adder or multiplier modules, this generator combines the supported arithmetic expressions into a single module. The disadvantage of this approach is that it restricts the designer from examining different alternatives to meet his design goals. The authors of [l 11 present a macro-cell generator for floating-point division and square root operation. A module generator for logic-emulation applications, which is able to generate macro cells of arbitrarily complex functions described in HDL is proposed in [ 121. This module generator accepts Verilog description and produces partitioned Configurable Logic Block (CLB) netlists for decomposing the design into multiple FPGA chips. In [13], a generator for multiplier and squarer structures is proposed. The generator utilizes Wallace tree summation and produces synthesizable VHDL code. However, it is a stand-alone tool limited to only these two types of arithmetic structures. Another stand-alone generator that implements multiplexers and multiplexer-based structures is presented in [ 141.

In this work we present a network-based generator of cores performing arithmetic operations. Interaction with the user can be made through a www-based interface. Since it is a www-based tool no installation or upgrades are required by the designer. The generator produces structural and synthesizable VHDL and/or Verilog descriptions of an arithmetic module given the type of the requested operation and the size of the operands. Since a completely structural description is produced, reusability is assured. The generator covers several arithmetic operations, such as addition, subtraction, multiplication, division, square-rooting, squaring and shifting. For most operations it is capable of producing more than one core using different algorithms for their implementation thus allowing the designer to select the one that best suits his needs in terms of area, delay, power and testability. Therefore, the generator can both alleviate the design effort and help towards the reduction of time to market.

The remaining of the paper is organized as follows: In Section 2 we present the generator, that is, its characteristics, its www-based interface and the cores that it supports. Experimental results obtained using the generator are presented in Section 3.

2. The www-based generator We present in this section the requirements and the

specifications of the core generator, its www-based interface and the cores and sizes that are currently supported.

2.1. Requirements and specifications There are several requirements that a generator tool for

arithmetic cores has to conform in order to be generic and easy-to-use:

(a) Technology Independence. It should not restrict to specific gates or technology libraries. Rather it must provide the core description in a more abstract level (eg. in HDL).

(b) Easy Integration. The designer must be able to integrate, without difficulty, the modules generated by the tool with the rest of his design. Furthermore, the modules must be implemented in a modular way in order to enable customization.

(c) Many Alternatives. The tool must be able to produce cores for at least the four basic operations (addition, subtraction, multiplication and division). For each type of operation it has to give the designer the opportunity to select, between several different architectures performing a specific operation, the one that best suits his needs. Moreover all these alternatives should be available independently of the targeted technology.

(d) No restrictions on the operand size. The arithmetic core generator must have the smallest possible limitations about the operands size of the cores that can produce.

(e) Applicability. The generator must be able to produce, at least, the most frequently used arithmetic modules.

(f) Easy-to-Use. The tool has to provide the designer with a simple and easy-to-use interface.

(g) Expandability. It has to be implemented with an open interface that ensures that the addition of new capabilities or architectures for arithmetic operations is straightforward.

Having all these requirements in mind, we have made the following decisions regarding the implementation of the arithmetic core generator. In order to ensure technology independence and reusability, the generator produces its description in structural Verilog and/or VHDL language. The need for supporting both languages is driven by the fact that neither of them has become dominant in the design world. Note that this gives the ability to the designer to easily integrate a module with the rest of a design and map it to a specific technology library through the use of any commercially available logic synthesis tool. Furthermore, customization is possible. The requested cores, that the tool generates, are described as a hierarchical netlist with relatively small behavioral subfunctions in the leaf cells. (eg. full adders, half adders, controlled adder/subtracters). For this reason the designer has the ability to define his own gate-level implementations of the leaf cells even if he has no knowledge of the algorithm implementing the arithmetic operation.

183

Page 3: [IEEE Comput. Soc 12th International Workshop on Rapid System Protyping. RSP 2001 - Monterey, CA, USA (25-27 June 2001)] Proceedings 12th International Workshop on Rapid System Prototyping

Moreover, the tool supports a comprehensive set of different architectures for each of the four basic arithmetic operations (addition, subtraction, multiplication and division). For example the tool can implement both ripple-carry and carry look-ahead adders, both Baugh-Wooley and Booth-recoded multipliers etc. This feature provides the designer with a volume of design alternatives for a specific type of arithmetic operation and the ability to choose the one that best meets his needs regarding area, delay, power and testability. Moreover, the tool is able to produce some other arithmetic modules that are frequently used in system designs such as fractional squarers and square rooters and multiplexer-based shifters.

Finally, the generator is implemented as a software tool with a www-based user interface so as to help the designer interact with it without having to install it to his workstation.

CUENT 1 SERVER

simulation tools con 'i.

Generator I

Figure 1. The structure of the generator

2.2. Structure of the generator

The structure of the www-based generator is described in Figure 1. The arrows define the flow of information between the client where the designer stands and the server where the generator is installed. First the designer, using any web-browsing tool, initiates a request to use the generator. If he is authorized to use the tool, a parameter form is passed to him. Information about the supported cores and sizes are available to help him make his decisions. Then the designer defines the type of the arithmetic operation, the size of the operands and the HDL language of the requested core. The parameters are passed back to the www server who initiates the core generator as a system process. The generator produces the requested core and stores it in the www server. Then a hyperlink of the produced core is passed to the client who is able to browse or download the core. At the client. the

core can either be combined with the rest of a design and mapped to a specific technology with a logic synthesis tool or can be simulated for functional verification using a test-bench module that is produced by the generator together with the core. Some snapshots of the tool are given in Figure 2.

2.3. Arithmetic cores and supported sizes

We now present the types of cores that the generator can produce. The cores are described as hierarchical netlists with small behavioral or structural subfunctions in the leaf cells (such as full adders, etc) which allow easy customization. For example, the designer can define his own gate-level implementations of the leaf cells even if he has no knowledge of the algorithm implementing the arithmetic operation, or insert pipeline registers between the stages of a core or combine more than one cores to implement specialized operations.

Subtraction is usually combined with addition and' therefore we consider these two types of arithmetic cores as one. In the case of adderhbtracter cores the generator can produce: (a) adderdsubtracters based on Ripple-Carry propagation composed of full adders, (b) group Carry Look-Ahead addershubtracters composed of 4-bit Carry Look-Ahead adder units and ripple carry propagation between groups, (c) Carry Look-Ahead adders with multiple levels of carry generation logic [ 151, (d) Sklansky parallel-prefix adders [ 161, and (e) Kogge-Stone parallel- prefix adders [ 171.

In the case of parallel multiplier cores we distinguish the case of unsigned operands and the case of operands in 2's complement arithmetic. In the former case, the generator can produce: (a) Carry-Propagate Array Multipliers composed of a series of ripple-carry adders for adding the partial products and (b) Carry Save Array Multipliers [15] that use carry-save adders to sum the partial product bits and a ripple-carry adder to produce the result.

In the case of 2's complement multipliers, the generator can produce cores implementing: (a) the Trisection Pezaris algorithm [ 181 consisting of 3 different "adder" basic cells, (b) the Baugh- Wooley algorithm [ 191, and (c) multiplier cores based on 2-bit Booth recoding [20]. In this case, carry-save adders are used to sum the partial products and an adder is used to produce the final product.

Furthermore, in the case of serial-parallel multipliers the generator can produce serial-parallel multiplier cores [21] for 2's complement numbers.

In the case of dividers, the generator can produce: (a) Non-Restoring Cellular Array Dividers and (b) Restoring Cellular Array Dividers composed of 1-bit controlled adderdsubtracters and 1 -bit controlled subtracters respectively [ 15 J.

184

Page 4: [IEEE Comput. Soc 12th International Workshop on Rapid System Protyping. RSP 2001 - Monterey, CA, USA (25-27 June 2001)] Proceedings 12th International Workshop on Rapid System Prototyping

" d l l l d BOD* Ih . lClpl ler . . .. . . .... . . . . . . . ... . . ... . .... ... . . . ....... . .. .. . . .... . ... .

Figure 2. The www-based core generator

Furthermore, the generator can produce fractional non- restoring squarers composed of controlled add/shift cells [ 151, fractional square rooters composed of controlled adderhubtracter cells [ 151 and multiplexer-based shifters capable of performing 4 different operations [22]. The shifters are composed of 4-to-1 and 2-to-1 multiplexers.

We have to note that all the above algorithms lead to highly regular structures, that is the produced cores are designed in the form of regularly repeated patterns of identical circuits. Thus, due to their geometrical regularity, they are suitable for efficient synthesis and implementation in FPGA or ASIC technologies.

The tool for generation of arithmetic cores has been implemented using ANSI C programming language whereas the www interface was based on HTML and Perl/CGI scripts. The implementation of the above algorithms for arithmetic operations imposes some restrictions on the size of the operands that can be handled. For example, since we use 2-bit recoding for Booth multipliers, the operand size must be a multiple of 2. Furthermore, in the current version of the tool, we support only modules with equal sized operands (in the case of dividers the size of one operand is twice the size of the other).

The supported cores and operand sizes are shown in Table 1. The third column indicates the smallest operand size supported while the fifth column indicates the largest operand size supported. The fourth column indicates the step between two supported operand sizes. For example, in the case of multi-level carry look-ahead adders, the tool can produce all cores with operand size a multiple of 4, e.g. 4, 8, 12, 16, ... , 256.

The output of the generator is the structural VHDL and/or Verilog description of the requested arithmetic core. This description consists of 4 distinct parts: (a) a core information part which provides information about the type of the arithmetic module as well as its primary inputs and outputs, (b) the basic cell definition part which consists of the description of the basic cells that are required to create the requested core, e.g. half adders, full adders, 1-bit controlled adders/subtracters, etc, (c) the core description part which is the structural description of the interconnections of the basic cells to create the requested core, and (d) a test-bench for functional verification which can provide input vectors to the arithmetic core giving the designer the ability to check its correct functionality by a logic simulation tool.

185

Page 5: [IEEE Comput. Soc 12th International Workshop on Rapid System Protyping. RSP 2001 - Monterey, CA, USA (25-27 June 2001)] Proceedings 12th International Workshop on Rapid System Prototyping

Adders/Subtracters i Ripple Carry AddedSubtracter Group Carry Look-Ahead AddedSubtracter

Minimum size Step Maximum Size 2 1 1024 4 4 1024

Multi-Level Carry Look-Ahead Adder 4 I 4 256 Sklansky Parallel-Prefix Adder Kogge-Stone Parallel-Prefix Adder

Multipliers

2 1 1024 2 1 1024

Carry Save Array Multiplier Trisection Pezaris Array Multiplier Baugh-Wooley Array Multiplier Modified Booth MultiDlier

2 1 256 2 1 256 2 1 256 2 2 256

I

Square Rooters

I I I ~~ ~

Serial-Parallel Multiplier 2 I 1 256

I Dividers Restoring Cellular Array Dividers 8 I powerof2 I 64 Non-Restoring Cellular Arrav Dividers I 8 I Dower of2 I 64

3. Experimental Results

Non-Restoring Fractional Square Rooter I 4 I 2

The generator presented in the previous section can be used in many applications. Since it provides a library with general-purpose arithmetic cores, it can be used in cases where rapid system prototyping is necessary for the successful completion of a project. The descriptions of the modules in structural HDL offer the ability to the designer to quickly and easily integrate these modules with his design and proceed to logic synthesis. He also has the ability to decide, among several alternatives, the one that is suitable for the specific design that he is interested in, regarding area, delay, power and testability. For example, he could use a different architecture when he is in the prototyping phase and a different one when he reaches the final implementation phase, basing his selection on the restrictions posed by the platform used in each phase (eg. FPGA vs ASIC).

256

We have used the generator to create and functionally verify several arithmetic cores. Exhaustive simulations have been carried out for small sizes by the test-benches that accompany each core. The regularity of the circuit descriptions guarantees correct functionality also for larger sizes.

Furthermore, because of the structural description, the cores that the generator produces can be synthesized with any commercial logic synthesis tool. We have performed logic synthesis on several Verilog descriptions produced by the generator. We have used two commercial logic synthesis tools: (a) Leonard0 Spectrum by Mentor Graphics with its sample ASIC XCLOSU library and (b) Design Analyzer by Synopsys driven by the AMS CUB library. We have also mapped these cores in 2 FPGAs, specifically in Altera EPFlOK30ETC144-1 using Altera MaxPlus I1 development system and in Xilinx v100ecs144-6 using Xilinx Foundation development system. Table 2 presents some results of the logic synthesis in terms of area and delay.

Non-Restoring Fractional Squarer

Table 2. Synthesized arithmetic cores

4 2 256

16/8-bit Restoring Array Divider I 640 I 78.7 I 1461 I 42.6 1 212 I 103.5 I 210 I 74.3 16/8-bit Non-Restoring Array Divider I 556 I 50.3 I 1159 I 39.6 I 188 I 105.5 I 210 I 80.8

Multiplexer-based Shifter

186

2 1 I 256

Page 6: [IEEE Comput. Soc 12th International Workshop on Rapid System Protyping. RSP 2001 - Monterey, CA, USA (25-27 June 2001)] Proceedings 12th International Workshop on Rapid System Prototyping

The time the generator needs to produce the requested cores is very small. The descriptions of the arithmetic cores presented in Table 2 were produced and delivered to the client in less than one second in a IOMbps LAN. The time needed to synthesize a core depends on the hardware platform and the specific logic synthesis tool that is used as well as the size of the core. The time needed to synthesize the cores of Table 2, using Leonard0 Spectrum, was less than 35 seconds on a system with an Intel Pentium-I11 5OOMHz processor with 1 OOMHz system bus speed and 128MBs of main memory. Note that core download and synthesis need to be performed only once for a specific core and implementation technology since the core or the synthesis results can be stored in a database in the user's workstation.

4. Conclusion

We have presented a www-based tool for generation of cores performing arithmetic operations. Given the type of the arithmetic operation, the size of the operands and the required architecture, the tool produces the requested module in structural VHDL and/or Verilog. The generator covers many arithmetic operations (addition, subtraction, multiplication, division, squaring, square rooting and shifting) and for most of them is capable of producing more than one different architecture. The use of the tool gives the ability to the designer to select, in the smallest possible time, between alternative designs the one that best meets his design goals. The use of the presented www-based generator minimizes the design effort and reduces time-to-market.

We are currently extending the generator to include cores for serialherial or serial/parallel arithmetic operations and to provide Built-In Self-Test structures together with the produced cores.

References

[l] M. Keating and P. Bricaud, Reuse Methodology Manual f o r System-On-A-Chip Designs, Kluwer Academic Publishers, 1998.

[2] G. Dare, D. Linzmeier, B. Deitrich, and K. Whitelaw, "Circuit Generation for Creating Architecture-Based Virtual Components", Proc. of Design Automation and Test in Europe Conference (User Forum), pp. 79-83,2000.

[3] D. Alles and G. Vergottini, "Taking a Look at Intemet-based Design in the Year 2001", Electronic Design, pp. 42-50, Jan 6, 1997.

[4] M. Spiller and A. Newton, "EDA and the Network", Proc. of International Conference on Computer-Aided Design, pp.

[ 5 ] L. Walczowski, D. Nalbantis, W. Waller and K. Shi, "Analogue Layout Generation by World Wide Web Server- based Agents", Proc. of European Design and Test Conference, pp. 384-388, 1997.

[6J T. Vassileva, V. Tchoumatchenko, I. Astinov and I .

470-476, 1997.

Furnadjiev, "Virtual VHDL Laboratory", Proc. of 5Ih International Conference on Electronics, Circuits and Systems, pp. 325-328, 1998. D. Nalbantis, L. Walczowski and W. Waller, "Multiple Server WWW-based Synthesis of VLSI Circuits", Proc. of 51h International Conference on Electronics, Circuits and Systems, pp. 537-540, 1998. V. Tchoumatchenko, T. Vassileva, I. Vassilev and I .

Furnadjiev, "WWW Based Distributed FPGA Design", Proc. Design Automation and Test in Europe Conference (User Forum), pp. 97-101, 2000.

[9] A. Schneider, R. McIlhenny and M. Ercegovac, "BigSky - An On-line Arithmetic Design Tool for FPGAs", Proc. of IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 303-304, 2000.

[lo] D. Kumar & B. Erickson, "ASOP: Arithmetic Sum-of- Products Generator", Proc. of Intemational Conference on Computer Design, pp. 522-526, 1994.

[ I l l M. Aberbour, A. Houelle, H. Mahrez, N. Vaucher & A. Guyot, "On Portable Macrocell FPU Generators for Division and Square Root Operators Complying to the Full IEEE-754 Standard", IEEE Transactions on VLSI Systems, Vol. 6, No. 1 , pp. 114-121, March 1998.

[12] W. Fang, A. Wu & D: Chen, "EmGen - A Module Generator for Logic Emulation Applications", IEEE Transactions on VLSI Systems, Vol. 7, No. 4, pp. 488-492, December 1999.

[ 131 J. Pihl and E. Aas, "A multiplier and squarer generator for high performance DSP applications", Proc. of 39th Midwest Symposium on Circuits and Systems, vol.1, pp. 109-112, 1996.

[I41 J. Abke, E. Barke and J. Stohmann, "A Universal Module Generator for LUT-based FPGAs", Proc. of Intemational Workshop on Rapid System Prototyping, pp. 230-235, 1999.

[ 151 K. Hwang, Computer Arithmetic Principles, Architecture and Design, John Wiley & Sons, 1979.

[I61 J. Sklansky, "Conditional Sum Addition Logic", .IRE Transactions on Electronic Computers, EC-9 (6) pp. 226- 23 1 , June 1960.

[I71 P. Kogge and H. Stone, "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations", IEEE Transactions on Computers, Vol. 22 (8),

[I81 D. Pezaris, "A 40ns 17-bit by 17-bit Array Multiplier", IEEE Transactions on Computers, Vol. C-20, No. 4, pp.

[19] R. Baugh and A. Wooley, "A Two's Complement Parallel Array Multiplication Algorithm", IEEE Transactions on Computers, Vol. C-22, No. 1-2, pp.1045-1047, December 1973.

[20] M. Annaratone, Digital CMOS Circuit Design, Kluwer Academic Publishers, 1986.

[21] G. Alexiou and N. Kanopoulos, "A New SeriallParallel Two's Complement Multiplier for VLSI Digital Signal Processing", International Joumal of Circuit Theory & Applications, Vol. 20, pp. 209-214, 1992.

[22] R. 0. Duarte, M. Nicolaidis, H. Bederr and Y. Zorian, "Efficient Totally Self-checking Shifter Design", Journal of Electronic Testing: Theory and Applications, Vol. 12, pp.

pp.783-791, August 1973.

442-447, April 197 1.

29-39. 1998.

187