analysis and design of high performance 128-bit parallel ...1302/... · analysis and design of high...
TRANSCRIPT
Analysis and Design of High Performance128-bit Parallel Prefix End-Around-Carry Adder
A Thesis Presentedby
Ogun Turkyilmaz
to
The Department of Electrical and Computer Engineering
in partial fulfillment of the requirementsfor the degree of
Master of Science
in
Electrical and Computer Engineering
Northeastern UniversityBoston, Massachusetts
August 2011
Abstract
Addition is a timing critical operation in today’s floating point units. In order to develop faster
processing, an end-around carry (EAC) was proposed as a part of fused-multiply-add unit which
performs multiplication followed by addition [5]. The proposed EAC adder was also investigated
through other prefix adders in FPGA technology as a complete adder [6]. In this thesis, we
propose a 128-bit standalone adder with parallel prefix end around carry logic and conditional
sum blocks to improve the critical path delay and provide flexibility to design with different
adder architectures. In previous works, CLA logic was used for EAC logic. Using a modified
structure of a parallel prefix 2n − 1 adder provides flexibility to the design and decreases the
length of the carry path. After the architecture is tested and verified, critical path is analyzed
using FreePDK45nm library. Full custom design techniques are applied carefully during critical
path optimization. Critical path analysis provides fast comparison of the total delay among
different architectures without designing the whole circuit and a simpler approach to size the
transistors for lowest delay possible. As a final step, datapath is designed as a recurring bitslice
for fast layout entry. The results show that the proposed adder shows 142ps delay, 2.42mW
average power dissipation, and 3,132 sq. micron area assuming there is not much routing area
overhead in the estimated area.
Acknowledgements
I would like to express my foremost appreciation to my advisor, Prof. Yong-Bin Kim for
giving me the opportunity to conduct research at Northeastern University. I am grateful for
his technical guidance and constant support in my graduate career. Without his valuable
suggestions and assistance, this thesis would not be accomplished.
I would like to thank the committee members, Prof. Fabrizio Lombardi and Prof. Gu-
nar Schirner for reading my thesis and offering valuable suggestions and contributions. I am,
especially, grateful to Prof. Schirner for the long discussions about graduate study. He has
been a mentor and a teacher to me, who generously shares his knowledge and experiences with
tremendous enthusiasm and never ending encouragement.
It has been an honor and a great pleasure to study as a Fulbright Scholar. I would like
to express my appreciation to the Fulbright Commission for giving me the chance to pursue
further academical study and connect with many accomplished scholars.
I would also like to thank Faith Crisley, Graduate Coordinator at the ECE Department, for
her support and valuable suggestions. She has always been helpful with her comforting manner
even in the most stressful moments.
Last but not least, I would like to express my sincere appreciation to my beloved parents,
Nuket and Nevzat Turkyilmaz, and my sister, Pinar Turkyilmaz, who encouraged me continu-
ously in every step I take, supported me constantly through every hardship I faced and loved
me without boundaries. Without them, I could not even imagine being where I am today.
Ogun TurkyilmazAugust 2011
Contents
1 Introduction 11.1 Fused Multiply-Add Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Ripple Carry Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.2 Carry lookahead adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.3 Parallel Prefix Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Tree Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Recurrence Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 Weinberger Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.4.2 Ling Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Modulo Adders 162.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Addition in Modulo 2n − 1 Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3 Analysis of Previous End-Around-Carry Adders . . . . . . . . . . . . . . . . . . . 182.4 Carry-lookahead EAC Logic Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Modified Parallel Prefix EAC Adder 243.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Proposed Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 The 16bit blocks in EAC adder . . . . . . . . . . . . . . . . . . . . . . . . 253.2.2 Parallel Prefix 2n − 1 EAC Block . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Implementation and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Critical Path Analysis 304.1 Path Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Path Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Transistor Level Design and Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.1 Logic Level Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3.2 Late arriving signal exploitation . . . . . . . . . . . . . . . . . . . . . . . 344.3.3 Logical Effort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3.4 Design with Helpers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Transistor Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
i
4.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 Datapath Library 455.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2 Concepts in Full Custom Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.3 Datapath Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.4 Layout Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6 Conclusion and Future Works 56
A Verilog Code of the Proposed Adder 63
B HSPICE Simulation Files 68B.1 Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68B.2 Simulation Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69B.3 Condition of Transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
List of Figures
1.1 Ripple Carry adder [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Carry lookahead adder [2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Group PG cells [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Taxonomy of prefix networks [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Kogge-Stone adder [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6 Sklansky adder [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.7 Brent-Kung adder [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.8 Han-Carlson adder [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.9 Knowles [2,1,1,1] adder [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.10 Ladner Fischer adder [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Prefix graph with fast end-around carry [4]. . . . . . . . . . . . . . . . . . . . . . 172.2 General block diagram Modulo 2n − 1 Adder [4]. . . . . . . . . . . . . . . . . . . 182.3 Block diagram of the 128-bit binary adder [5]. . . . . . . . . . . . . . . . . . . . . 192.4 Architecture of the EAC adder [6]. . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5 Architecture of the EAC adder [6]. . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Architecture of the modified EAC Adder. . . . . . . . . . . . . . . . . . . . . . . 253.2 16bit conditional sum blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 cin merge with fast carry link [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 cin merge with extra bit [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.5 Modified 8bit Kogge-Stone EAC block. . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 Critical path of the modified EAC adder. . . . . . . . . . . . . . . . . . . . . . . 314.2 Gate level design of critical path. . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3 Reduced first stage in Weinberger recursion adder [8]. . . . . . . . . . . . . . . . 334.4 AOI without late arriving exploitation. . . . . . . . . . . . . . . . . . . . . . . . . 344.5 AOI with late arriving exploitation. . . . . . . . . . . . . . . . . . . . . . . . . . 354.6 EAC logic with helper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.7 Spreadsheet for Logical Effort Calculation. . . . . . . . . . . . . . . . . . . . . . . 394.8 Transistor level schematic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.9 Transistor level schematic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.10 Delay vs. Vdd at 25 ◦C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.11 Power vs. Vdd at 25 ◦C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.12 Delay vs. Vdd at 100 ◦C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.13 Power vs. Vdd at 100 ◦C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1 Global floorplan of a datapath [9]. . . . . . . . . . . . . . . . . . . . . . . . . . . 46
iii
5.2 Regularity placement and routing datapath circuit [10]. . . . . . . . . . . . . . . 465.3 Schematic layout of datapath and detailed view of bitcell [11]. . . . . . . . . . . . 475.4 Representation of a datapath cell [9]. . . . . . . . . . . . . . . . . . . . . . . . . . 485.5 Designed basic cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.6 Bit slices of the blocks in the adder. . . . . . . . . . . . . . . . . . . . . . . . . . 505.7 Bit slice of 16bit Kogge-Stone adder. . . . . . . . . . . . . . . . . . . . . . . . . . 515.9 Wide layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.10 Stacked compact layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.8 Layout of the blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
List of Tables
1.1 Trade-off between different adder topologies. . . . . . . . . . . . . . . . . . . . . . 7
4.1 Delay and Power dissipation values in correspondence to VDD and Temperature. 43
5.1 Results comparison of proposed adder with the previous work. . . . . . . . . . . 53
v
Chapter 1
Introduction
Fused multiply add unit plays an important role in modern microprocessor. It performs floating
point multiplication followed by an addition of the product with a third floating point operand.
In 2007, a seven cycle fused multiply add pipeline unit was proposed as a part of the floating
point unit in IBM’s POWER6 microprocessor [5]. In this fused multiply add data flow, the
product should be aligned before it is added with the addend. Because the magnitude of the
product is unknown in the early stages prior to the combination with the addend it is difficult to
determine a priori which operand is bigger. Even if it was determined early that the product is
bigger, there would be a problem on conditionally complementing two intermediate operands the
carry and sum outputs of the counter tree. Thus an adder needs to be designed to always output
a positive magnitude result and preferably only needs to complement one operand. In [6], the
adder in POWER6, was taken as a reference for design space exploration in FPGA technology.
They designed a complete adder independent of FMA block and shown that Kogge-Stone does
not provide the best performance in FPGA technology. Zhang et al. [12] proposed recently a
108-bit adder for an FMA unit. All these adders took the adder in POWER6 as a reference.
We believe using a CLA block in this adder limits the possibility to totally exploit the benefits
of parallel prefix adders. We designed an adder with parallel prefix 2n − 1 block. Although
the carry increment topology is still employed, the number of carry merge terms is decreased
as well as the length end around carry path. Another area of improvement lies in the design of
first level addition blocks. Using a Carry-select scheme provides the benefit of choosing the sum
1
CHAPTER 1. INTRODUCTION 2
at the end of computation. However, carry path should not include the calculation of group
propagate and generate(PG) terms according to input carry. In our conditional sum blocks, the
carry path only includes PG terms with cin = 0 and the sum is calculated in non-critical path
for both conditions: cin = 0 and cin = 1. At the final stage, the real sum is selected according
to the output of the EAC block.
The thesis is organized as following; in Chapter 1 general information about adders are
provided to show the importance of design space. In Chapter 2 detailed analysis of modulo
2n − 1 adders and EAC adders are given. The modified adder is described in Chapter 3 and
compared with the previous architectures. Critical path analysis methodology is analyzed in
detail in Chapter 4 and simulation results are provided. Finally in Chapter 5 datapath library
design methodology is described.
1.1 Fused Multiply-Add Operation
A fused multiply-add(FMA) unit performs the multiplication A × B followed immediately by
an addition of the product and a third operand C so that the result T is calculated as Eqn. 1.1
in a single indivisable step [2]. Such a unit is capable of performing multiply only by setting
C = 0 and add(or subtract) only by setting, for example, B=1.
T = A×B + C = M + C (1.1)
An advantage of a fused multiply add unit, compared to separate multiply and adder, arises
when executing floating-point operations since rounding is performed only once for the result of
T = A×B+C rather than twice(for the multiply and then for the add) [13]. Since rounding may
introduce computation errors, reducing the number of rounding operations affects positively the
overall error.
The input of the operands are calculated at the CSA(Carry-Save Adder) multiplier tree and
the magnitude of the operands is not known prior to addition to determine which operand has
greater value. Since floating point is a sign magnitude operation, the result of the adder should
be in two’s complement form [14]. Therefore, an adder is needed to produce two separate results
CHAPTER 1. INTRODUCTION 3
for the following cases:
Case 1 : If operand M > C, |M − C| = M − C = M + C + 1
Case 2 : If operand C < M ,
|M − C| = C −M = −(M − C) = −(M + C + 1) = −(M + C)− 1 = (M + C)
During subtraction of M − C, the final carry out is Cout is 1 when M > C, and 0 when
C > M . Consequently, an End-Around-Carry adder produces two different results and Cout
determines whether case 1 or case 2 happens [12].
1.2 Adders
1.2.1 Ripple Carry Adders
The addition of two operands is the most frequent operation in almost any arithmetic unit.
A two-operand adder is used not only when performing additions and subtractions, but also
often employed when executing more complex operations like multiplication and division. Con-
sequently, a fast two-operand adder is essential [3].
si = ai ⊕ bi ⊕ ci (1.2)
ci+1 = ai · bi + (ai + bi) · ci (1.3)
Figure 1.1: Ripple Carry adder [1].
CHAPTER 1. INTRODUCTION 4
The most straightforward implementation of a parallel adder for two operands x and y is
through the use of n basic units called full adders. A full adder(FA) is a logical circuit that
accepts two operand bits, say x and y, and an incoming carry bit, denoted by s and an outgoing
carry bit denoted by c. As the notation suggests, the outgoing carry c is also the incoming
carry for the subsequent FA, which has c and y as the input bits. The FA is a combinational
digital circuit implementing the binary addition of three bits through the Boolean equations in
1.2 and 1.3. The ripple carry adder is shown on Figure 1.1.
1.2.2 Carry lookahead adders
The most commonly used scheme for accelerating carry propagation is the carry lookahead
scheme [2] on Figure 1.2. The equations in Eqn. 1.4 show the realization of the Carry-Look-
Ahead Generator. The main idea behind carry lookahead addition is an attempt to generate
all incoming carries in parallel(for all the n-1 high order FAs) and avoid the need to wait until
the correct carry propagates from the stage(FA) of the adder where it has been generated.
This can be accomplished in principle, since the carries generated and the way they propagate
depend only on the digits of the original numbers x-1, x-2,and y. These digits are available
simultaneously to all stages of the adder and consequently value of the incoming carry and
compute the sum bit accordingly. This however, would require and inordinately large number
of inputs to each stage of the adder, rendering this approach impractical [1].
Figure 1.2: Carry lookahead adder [2].
CHAPTER 1. INTRODUCTION 5
c4 = G∗0 + c0P
∗0 ,
c8 = G∗1 +G∗
0P∗1 + c0P
∗0P
∗1 ,
c12 = G∗2 +G∗
1P∗2 +G∗
0P∗1P
∗2 + c0P
∗0P
∗1P
∗2
(1.4)
1.2.3 Parallel Prefix Adders
A parallel prefix circuit is a combinational circuit with n inputs x1, x2, , xn producing the outputs
x1, x2◦x1, ..., xn◦xn−1◦...◦x1 where ◦ is the associativity binary operation. The first stage of the
adder generates individual P and G signals. The remaining stages constitute the parallel prefix
circuit with the fundamental carry operation serving as the ◦ associative binary operation. This
part of the adder can be designed in many different ways.
gi = ai · bipi = ai ⊕ bi
(1.5)
Gi:k = Gi:j + Pi:j ·Gj−1:k
Pi:k = Pi:j · Pj−1:k
(1.6)
Although computing carry-propagate addition can use generate and propagate signals, its
implementation in VLSI can be quite inefficient due to the number of wires that have to be
connected together. Parallel-prefix adders solve this problem by making the wires shorter with
simple gate structures to aid in the passing of groups of carries to the next weight [3] [15]. The
proof of parallel prefix adder can be found in [16].
Parallel-prefix adders can be broken down into three stages:
• Pre-computation: single bit carry generate/propagate are obtained with Equation 1.5 and
temporary sum is generated. This stage can be simplified applying the rules defined in Section
1.4.1.
• Parallel-prefix tree: carry at each bit is computed with group carry generate/propagate
where Equation 1.6 is applied. It is possible to simplify the first stage of the tree using the rules
defined in Section 1.4.2.
CHAPTER 1. INTRODUCTION 6
• Post-computation: Sum and carry-out are derived with Equation 1.2.
Basic cell definitions of prefix adders is shown Figure 1.3.
Figure 1.3: Group PG cells [3].
1.3 Tree Adders
Tree structures have been used for graphically representing the various parallel prefix algorithms.
Many state-of-the-art adder circuits use parallel prefix schemes to achieve high performance
[17–19].
For wide adders, the delay of carry-lookahead (or carry-skip or carry-select) adders becomes
dominated by the delay of passing the carry through the lookahead stages. This delay can
be reduced by looking ahead across the look- ahead blocks [20]. In general, it possible to
construct multilevel tree of look-ahead structures to achieve delay that grows with log N. Such
adders are variously referred to as tree adders, logarithmic adders, multilevel-lookahead adders,
parallel-prefix adders, or simply lookahead adders.
There are many ways to build the lookahead tree that offer tradeoffs among the number of
CHAPTER 1. INTRODUCTION 7
Table 1.1: Trade-off between different adder topologies.Topology Logic Level Fanout Wiring Track
Kogge-Stone [22] Low Low HighSklansky [23] Low High LowBrent-Kung [24] High Low Low
stages of logic, the number of logic gates, the maximum fanout on each gate, and the amount
of wiring between stages. Figure 1.4 shows a three dimensional taxonomy of prefix adders [21].
Far edge adders are Kogge-Stone [22], Sklansky [23] and Brent-Kung [24] for their properties of
low logic-level, higher wiring track; low logic-level, high fanout and high logic-level, low fanout
respectively as shown on Table 1.1.
Figure 1.4: Taxonomy of prefix networks [3].
The following parallel prefix adders can be found in the literature:
Kogge-Stone The Kogge-Stone tree [22] Figure 1.5 achieves both log2N stages and fanout
of 2 at each stage. This comes at the cost of long wires that must be routed between
stages. The tree also contains more PG cells; while this may not impact the area if the
adder layout is on a regular grid, it will increase power consumption. Despite these cost,
CHAPTER 1. INTRODUCTION 8
Kogge-Stone adder is generally used for wide adders because it shows the lowest delay
among other structures.
Figure 1.5: Kogge-Stone adder [3].
Sklansky The Sklansky or divide-and-conquer tree [23] on Figure 1.6 reduces the delay to
log2N stages by computing intermediate prefixes along with the large group prefixes.
This comes at the expense of fan-outs that double at each level: The gates fanout to [8, 4,
2, 1] other columns. These high fan-outs cause poor performance on wide adders unless
the gates are appropriately sized or the critical signals are buffered before being used
for the intermediate prefixes. Transistor sizing can cut into the regularity of the layout
because multiple sizes of each cell are required, although the larger gates can spread into
adjacent columns. With appropriate buffering, the fan-outs can be reduced to [8,1,1,1].
CHAPTER 1. INTRODUCTION 9
Figure 1.6: Sklansky adder [3].
Brent-Kung The Brent-Kung tree [24] as shown on Figure 1.7 computes prefixes for 2-bit
groups. These are used to find prefixes for 4-bit groups, which in turn are used to find
prefixes for 8-bit groups, and so forth. The prefixes then fan back down to compute the
carries-in to each bit. The tree requires 2(log2N) − 1 stages. The fanout is limited to 2
at each stage. The diagram shows buffers used to minimize the fanout and loading on the
gates, but in practice, the buffers are generally omitted.
Figure 1.7: Brent-Kung adder [3].
CHAPTER 1. INTRODUCTION 10
Other than those major adders trade-off can be achieved with the following adders.
The Han-Carlson trees [25] are a family of networks between Kogge-Stone and Brent-Kung.
Figure 1.8 shows such a tree that performs Kogge-Stone on the odd- numbered bits, and then
uses one more stage to ripple into the even positions.
The Knowles trees [26] are a family of networks between Kogge-Stone and Sklansky. All
of these trees have log2N stages, but differ in the fanout and number of wires. If we say that
16-bit Kogge-Stone and Sklansky adders drive fanouts of [1, 1,1,1] and [8, 4, 2, 1] other columns,
respectively, the Knowles networks lie between these extremes. For example, Figure 1.9 shows
a [2, 1, 1, 1] Knowles tree that halves the number of wires in the final track at the expense of
doubling the load on those wires.
The Ladner-Fischer trees [27] are a family of networks between Sklansky and Brent-Kung.
Figure 1.10 is similar to Sklansky, but computes prefixes for the odd- numbered bits and again
uses one more stage to ripple into the even positions. Cells at high-fanout nodes must still be
sized or merged appropriately to achieve good speed.
Figure 1.8: Han-Carlson adder [3].
CHAPTER 1. INTRODUCTION 11
Figure 1.9: Knowles [2,1,1,1] adder [3].
Figure 1.10: Ladner Fischer adder [3].
1.4 Recurrence Algorithms
Recurrence algorithms have been a research area for a long time [8] [28]. Weinberger presented
the most widely known carry recurrence for VLSI addition in 1958 [20]. Over the years, several
addition algorithms have been developed. These algorithms manipulate the carry and sum
equations in an attempt to improve the speed of addition. The equations for sum and carry are
defined as equation 1.2 and 1.3.
CHAPTER 1. INTRODUCTION 12
Ling modified the algorithm to reduce the complexity of the carry computation at the
cost of increased complexity in the sum computation. An analysis was later performed by
Doran [29] to determine the set of recurrences which have recurrence properties that are similar
to Weinberger’s and Ling’s.
1.4.1 Weinberger Recurrence
Weinberger [20] demonstrated that addition speed could be improved by parallelizing the compu-
tation of carry. Although widely credited with only the Carry Look-Ahead Adder, Weinberger’s
recurrence was not limited in group size or number of levels for carry computation [20]. The
fundamental advancement of his work was the introduction of generate and propagate as shown
on Eqn. 1.7. Weinberger defined the terms: bitwise generate (g), bitwise propagate (p), group
generate (G), and group propagate (P ). These terms allow for carry computation to be per-
formed in parallel, yielding a significant improvement in performance compared to ripple-carry
addition. For a group of 4 bits, Weinberger recurrence has ten terms for the generation of G5:2
from the inputs and four terms for the generation of P5:2. The maximum transistor stack height
is 5.
Weinberger demonstrated that and could be used to create blocks of arbitrary size and
parallelized to form multiple levels of recurrence [20]. Thus, the majority of parallel prefix adders
proposed for high-performance addition employ the realizations of Weinberger’s recurrence, e.g.,
Kogge-Stone [22], Brent-Kung [24], Han-Carlson [25], Ladner-Fischer [27], and those described
by Knowles [26].
gi = ai · bipi = ai + bi
(1.7)
1.4.2 Ling Recurrence
IBM ECL technology limitations on fan-in (limited to 4) and wired-OR (limited to 8) motivated
Ling to develop a transformation that reduced the fan-in of Weinberger’s recurrence [30] [8].
For clarity, a simple derivation of Ling’s transformation will be shown. This derivation provides
CHAPTER 1. INTRODUCTION 13
the physical meaning of the signals used in Ling’s transformation and identifies the favorable
characteristics of Ling for implementation in modern CMOS technology. In the derivation, the
bitwise generate signal is defined as: gi = ai · bi and the bitwise propagate signal is defined as:
ti = ai+bi. Note that the propagate signal ti is the same as Weinberger’s pi (when implemented
using an OR). To maintain consistency with Ling’s original paper, ti will be used for propagate.
Ling’s transformation reduces the complexity of Weinberger’s recurrence by factoring ti from
ci+1 to create a pseudo-carry (hi) on which the recurrence is performed. The transformation is
shown below on c1 to form h0. The carry-out signal, c1 , of the first bit position is
c1 = g0 + t0 · c0 (1.8)
Ling’s transformation uses the property ti · gi = gi to form
c1 = t0 · g0 + t0 · c0 = t0 · (g0 + c0) (1.9)
where g0 + c0 = h0, which leads to
c1 = t0 · (g0 + c0) = t0 · h0 (1.10)
The general transformation of is ci defined as
ci =
{ti−1hi−1 if i > 0c0 i=0
(1.11)
where the pseudo-carry, hi , is defined as
hi = gi + ci (1.12)
The physical meaning of the pseudo-carry signal h can be described as follows. By factoring
ti out of the carry expression and propagating hi instead of ci+1, all cases where carry is
generated and/or propagated from the stage preceding stage i are included in hi. This includes
the case where a carry-in to the ith stage can be assimilated (which should not result in a
carry-out). The assimilate condition is handled when forming ci+1 by ANDing hi with ti to
produce ci+1. If the carry-assimilate (carry-kill) condition exists then ti = 0 , which results in
ci+1 = 0. A recurrence for can be defined as has been done previously for Weinberger’s ci. The
CHAPTER 1. INTRODUCTION 14
group pseudo-carry and transmit which allow for parallel prefix computation can be defined
over the group of bits (capital letters are used to refer to the group):
Ti:j = ti · ti−1 · · · tj (1.13)
Hi:j = gi + gi−1 + ti−1 · gi−2 + ti−1 · ti−2gi−3 + · · ·+ ti−1 · ti−2 · · · tj+1 · gj (1.14)
The recurrence can be expressed using the ”•” operator as
(Hi:j
Ti−1:j−1
)•(
Hj−1:k
Tj−2:k−1
)=
(Hi:j + Ti−1:j−1 ·Hj−1:k
Ti−1:j−1 · Tj−2:k−1
)(1.15)
The transformation from Weinberger’s recurrence to Ling’s recurrence for a group of 4 bits
is shown in the example in Fig. 2. This figure should dispel any difficulties associated with
understanding the original Ling’s derivation.
The advantage of using pseudo-carry instead of carry is offset by the increased complexity
of sum computation, which requires the real carry to form individual sum signals. In CMOS
technology can be efficiently calculated conditionally, thus avoiding the AND operation on the
critical carry path:
si =
{ai ⊕ bi hi−1=0ai ⊕ bi ⊕ ti−1 hi−1=1
(1.16)
1.5 Conclusion
In this chapter general information about binary adder realization in VLSI perspective is de-
scribed. Parallel prefix adders provide the fastest carry propagation on the critical path for wide
adders and it allows fast layout design because of its regularity. The most important advantage
is the design space with many trade-offs in delay, power dissipation and area. Also, recurrence
algorithms are provided to show it is possible to shorten the carry path with different propagate
and generate terms.
In the next chapter, a detailed analysis about the modulo adders is given. End-around-carry
adders are analyzed in detail as a specialized realization of modulo adders. Analysis of previous
CHAPTER 1. INTRODUCTION 15
work is also included.
Chapter 2
Modulo Adders
2.1 Introduction
Modular arithmetic has been an interest to researchers in a wide range of areas, since its
operations are the basis for systems that use the Residue Number Systems (RNS) [31]. Modulo
addition/subtraction and multiplication can also be applied to digital filters [32], cryptography
[33], error detection and correction [34], as well as checksum computation in high-speed networks
[35]. More importantly, modulo 2n−1 addition is a common operation that can be implemented
in hardware, because of its circuit efficiency and simple implementation [36].
In end-around carry adders the carry-in depends on the carry out [4] ie. the carry out cout is
fed through some logic back to the carry in cin. In particular, this is used for modulo 2n +1 [37]
and 2n − 1 [38] addition rely on decrement and increment respectively of the addition result
depending on cout. Since prefix algorithms actually rely on incrementer structures, considering
parallel-prefix schemes for this kind of adders is very promising.
In order to obtain fast end around carry adders both conditions of fast carry out generation
and fast carry in processing have to be met. This implies that there should be no combinational
path existing between cin and cout. A fast end around carry adder can be built using the prefix
structure. Here the last prefix is used as an incrementer which is controlled by the cout of the
previous prefix stages.
16
CHAPTER 2. MODULO ADDERS 17
2.2 Addition in Modulo 2n − 1 Adder
Addition modulo 2n− 1 or ones complement addition can be formulated by the following equa-
tion:
A+B(mod2n−1) =
{A+B − (2n − 1) = A+B + 1 (mod2n) if A+B ≥ 2n − 1A+B otherwise
(2.1)
However, the condition A + B ≥ 2n − 1 is not trivial to compute. It can be rewritten as
A+B ≥ 2n with a carry input of 1.
A+B(mod2n − 1) =
{A+B − (2n − 1) = A+B + 1 (mod2n) if A+B ≥ 2n
A+B otherwise(2.2)
Now the carry out cout from the addition A+B can be used to determine whether increment
has to be performed or even, simpler cout can be added to the sum of A + B. This equation
however results in a double representation of zero ( ie. 0 = 00 = 11)
Figure 2.1: Prefix graph with fast end-around carry [4].
The standard approach for the implementation of a modulo 2n − 1 adder is by using a
conventional carry propagate adder(CPA) abd have the carry out fed back into the carry in of
the adder. This creates the necessary end-around-carry needed to have modulo 2n − 1 operate
correctly as shown on Figure 2.2.
CHAPTER 2. MODULO ADDERS 18
Figure 2.2: General block diagram Modulo 2n − 1 Adder [4].
2.3 Analysis of Previous End-Around-Carry Adders
Previous defined end-around action can be obtained using different rules. Although the EAC
adder has been used [39] and implemented on several microprocessors, very few details exist
on their formulations and arithmetic algorithms in today’s literature. Schwarz [40] provided
explanations about some aspects of the EAC adder’s algorithm as a part of fused-multiply-
add(FMA) unit. Shedletsky [41] defined the indeterminate behavior of EAC adders and Liu et
al. [42] presented a formal analysis of EAC adders.
CHAPTER 2. MODULO ADDERS 19
Figure 2.3: Block diagram of the 128-bit binary adder [5].
Liu et al. [6] [42] extended the algorithm to make the adder independent without being
part of a FMA unit. The design mainly follows the algorithms of the EAC adder which is
implemented in IBM POWER6 microprocessor [5]. The additional logic units of the proposed
adder are useful to ensure the whole adder to work independently. Another advantage is that
it is easier to implement and test the adder in FPGA technology, which enables design space
exploration. Figure 2.4 shows the architecture of the adder.
CHAPTER 2. MODULO ADDERS 20
Figure 2.4: Architecture of the EAC adder [6].
EAC means that when subtracting two signed numbers that are in signed magnitude format,
the subtraction is implemented by the addition of the first operand with the Boolean complement
of the second operand. For this addition, instead of setting a carry into the least significant
digit, the carry out of the most significant digit is taken as the carry in. This ensures that
the result of the addition is always positive magnitude result and preferably only one operand
needs to be conditionally complemented. Thus, an EAC adder performs addition similar to
other regular adders and subtraction using the end around carry to ensure the result is positive.
The adder shown in Figure 2.4 should satisfy the following conditions: 1) when x.s = y.s
the adder should do addition and we have s.s = x.s and s.s = x.x+ y.y. 2) when x.s 6= y.s, the
CHAPTER 2. MODULO ADDERS 21
adder should do subtraction. If x.x ≥ y.y, then s.s = x.s and s.s = x.x− y.y; if x.x < y.y then
s.s = y.s and s.s = y.y − x.x.
The subtraction operation can be described as follows:
1) Determining which operand is bigger. After a subtraction operation if result is positive,
operand x is bigger otherwise y is bigger. When x.x − y.y = x.x + y.y + 1 = x.x + 2n − y.y,
the carry out of x.x + y.y + 1 will be 1. Therefore, cout results as 1 if x is bigger and 0 if y is
bigger. Hence the sum equation can be written as x.x+ y.y + cout
2)When y is bigger, cout = 0 the subtraction can be written as s.s = y.y − x.x = −(y.y −
x.x) = −(x.x+ y.y + 1) = (x.x+ y.y + 0) + 1− 1 = (x.x+ y.y).
3) The cout is used to select correct result:
s.s =
{x.x+ y.y + cout cout = 1
x.x+ y.y + cout cout = 0
In order to implement addition and subtraction in one adder,y should be complemented
conditionally. The effective operation can be defined as;
Os = x.s⊕ y.s (2.3)
yt =
{y.y Os = 0y.y Os = 1
The sign of the result is determined after sign logic:
s.s =
{x.s cout = 1y.s cout = 0
Generally implementation of an adder/subtracter is achieved using two different adders, one
for addition and one for subtraction. After the results for both of the operation is calculated,
the final result is selected with a multiplexer as given on Figure 2.5.
CHAPTER 2. MODULO ADDERS 22
Figure 2.5: Architecture of the EAC adder [6].
2.4 Carry-lookahead EAC Logic Unit
The use of EAC unit helps implementing adder/subtracter using only one adder. In [40], the
algorithm for EAC unit with four carry bits, can be found. The most significant bit is labeled
as 0. The group carries for a CLA adder is defined as :
C0 = G0 + P0G1 + P0P1G2 + P0P1P2G3 + P0P1P2P3Cin
C1 = G1 + P1G2 + P1P2G3 + P1P2P3Cin
C2 = G2 + P2G3 + P2P3Cin
C3 = G3 + P3Cin
(2.4)
If the carry out C0 is fed to the carry in, EAC operation is achieved as Equation 2.5.
C0 = G0 + P0G1 + P0P1G2 + P0P1P2G3 + P0P1P2P3
C1 = G1 + P1G2 + P1P2G3 + P1P2P3G0 + P0P1P2P3
C2 = G2 + P2G3 + P2P3G0 + P2P3P0G1 + P0P1P2P3
C3 = G3 + P3G0 + P3P0G1 + P3P0P1G2 + P0P1P2P3
(2.5)
The combination of the carries in this way result in a carry chain for every group that is the
length of the width of the adder. This wrapping of the carries is correct for subtraction but is
not correct for addition. To make adder selectable for addition and subtraction. the P3 term
needs to be modified. An extra bit is combined to the least significant bit of the adder to assert
the carry propagation when the effective operation is subtraction. This bit can be integrated
CHAPTER 2. MODULO ADDERS 23
into P3 to make P3 = 0 for an effective operation of addition as defined in Eqn 2.6.
P t3 =
{P3 Os = 10 Os = 0
(2.6)
2.5 Conclusion
In this chapter detailed information about binary modulo adders and previous work on EAC
adders is provided. It is shown that when the cout is fed back into a carry-increment stage
provides EAC operation as defined as 2n − 1 adders. The designs in [5] and [12] employ the
adder as a part of FMA flow. However, by addition of extra logic Liu [6] proposes a stand alone
adder.
In the next chapter detailed analysis about the proposed parallel prefix EAC adder will be
given. The design premises an alternative to the previous work while providing wider design
space.
Chapter 3
Modified Parallel Prefix EAC Adder
3.1 Introduction
Previosly proposed adders were analyzed in section 2.3. Since the designed adder in [5] did
not include the implementation details about how the blocks were internally built, [6] made a
detailed analysis about how the first stage adder and EAC blocks are defined. They also ex-
tended the work to a complete adder without a FMA unit. Their implementation was directed
to FPGA. Although the analysis that they provided solved many of the questions about previ-
ously designed work, a number of the parts of the adder still needs to analyzed. In this chapter,
we propose a modified adder which uses a modified parallel prefix 2n − 1 adder block as the
EAC logic with the conditional sum blocks for flexibility among different adder architectures
and lower total propagation delay.
3.2 Proposed Adder
Figure 3.1 shows the architecture of the proposed adder. The first level includes eight 16bit
blocks of Kogge-Stone prefix-2 adders for 128bit inputs and the second level includes modiifed
2n − 1 parallel prefix Kogge-Stone adder. In order to design a standalone adder, the input
complement, add/sub and sign blocks are included as suggested in [6].
24
CHAPTER 3. MODIFIED PARALLEL PREFIX EAC ADDER 25
Figure 3.1: Architecture of the modified EAC Adder.
3.2.1 The 16bit blocks in EAC adder
Figure 3.2 shows the 16bit conditional sum blocks. The black and gray blocks are the same as
Fig. 1.3. The dashed lines correspond to the halfsum blocks, hi = ai ⊕ bi. GG and GP refer to
the group generate and propagate signals.
As explained in [6], when carry in to the adder block is assumed 0, it is possible to reduce the
complexity of the adder. When carry in is intended to propagate, the generate term in Equation
1.6 can be extended as Gi:k = Gi:j +Pi:j ·cin. Thus, for cin = 0 it results in Gi:k = Gi:j . It can be
seen that the generated group carry for the corresponding bit position need not be merged with
the previous carry. Thus, it is possible to reduce the number of black terms and use gray terms
instead. Namely, if the there is no carry merge operation in the next level, the corresponding
level can be terminated with a gray cell. Figure 1.5 shows 16-bit Kogge-stone tree with gray
CHAPTER 3. MODIFIED PARALLEL PREFIX EAC ADDER 26
and black cells. However, in our design since we included a second stage, we need both the
generate and propagate terms. This necessity increases the use of black cells instead of gray
cells.
Figure 3.2: 16bit conditional sum blocks.
The adder in [6] is not clear about how the input carry is merged in the first level of adders.
A discussion of the most efficient approaches for the traditional carries can be found in [43].
The carry-in bit can be included either by adding a fast carry increment stage or by treating
cin as an extra bit of the preprocessing stage of the adder. The first case in shown in Figure
3.3. The second case can be derived by setting g−1 = cin according to Equation 1.6. As a result
of these schemes, the complexity increases to solve carry incorporation problem.
Additionally [6] and [5] does not explain how the carry is propagated after 8bit blocks.
Although a similar structure is used in [12], the adder architecture is different and only one set
of generate-propagate is calculated and propagated in the first level. Since [6] uses two different
adders for each condition when cin = 0 and cin = 1, two set carries need to be selected before
CHAPTER 3. MODIFIED PARALLEL PREFIX EAC ADDER 27
leaving the first level. This operation makes the calculation more complicated. Thus, we use
the architecture in Figure 3.2 to calculate one set of generate-propagate as group terms for the
conditional of cin = 0. In the next stage, conditional carry for cin = 1 is calculated with a
simple equation Gi:k = Gi:j + Pi:j which is simply an OR gate.
Figure 3.3: cin merge with fast carry link [7].
Figure 3.4: cin merge with extra bit [7].
As a final step the sum for each carry condition, for cin = 0 and cin = 1, is calculated to be
CHAPTER 3. MODIFIED PARALLEL PREFIX EAC ADDER 28
selected according to the result of the second level of carry calculation.
3.2.2 Parallel Prefix 2n − 1 EAC Block
As explained in chapter 2, 2n − 1 adders can be used for EAC calculation. In this section,
we extend the use of 2n − 1 adders for adder/subtractor. Figure 3.5 shows the modified 8bit
Kogge-Stone EAC block.
Figure 3.5: Modified 8bit Kogge-Stone EAC block.
The block first takes the group generate and propagate terms that are calculated in the first
level. Since there are eight 16 bit adders in the first level, the carry for the whole adder must
be calculated first. The 8 carries are merged in the Kogge-Stone adder. In order to achieve
subtraction operation, the final carry at the most significant bit position must be merged with
lower significant position. This step is also called as carry increment stage. The effective
operation signal Os is defined as in Eqn. 2.3. When operation is subtraction, Os = 1, the
AND gate propagates the most signiificant carry and when operation is addition, Os = 0, most
significant carry is blocked and the unit works as a regular adder. ct8 = Os ·G127:0 is defined as
the end-around-carry in Eqn. 3.1.
ct8 =
{G127:0 Os = 10 Os = 0
(3.1)
In the literature [37] [4] [44] [45] [15] carry-increment stage has found many usage for 2n− 1
operation. Especially for the wide adders, N ≥ 64 the final carry needs to travel a very long path
to arrive on the least significant position, actually twice the length from the least significant
CHAPTER 3. MODIFIED PARALLEL PREFIX EAC ADDER 29
position to least significant position again. It is defined in [40] that carry needs to travel only
the total length in an EAC operation. That result motivated the use of CLA logic as a EAC
logic. In our adder, we use a second stage for EAC calculation instead of one long parallel prefix
propagation and carry increment stages. Namely for 128 bit EAC adder the shorthest stage
count is 2n = 128, n = 7 and one stage of 127 carry merge terms. In our adder, we have 4 stages
for the first level, 3 stages for the second level and 1 stage of 8 carry merge terms, which makes
the total stage count the same, whereas a much lower count of carry merge terms. Decreasing
the number of carry merge terms, actually, decreases the length of the carry path as well as the
delay.
3.3 Implementation and Validation
After the adder architecture is finalized, both the modified and previously proposed [6] adders
are designed in Verilog. A test fixture is created to validate the adder with the corner cases,
such as 0-to-1 crossover, carry propagation from 16th bit to the next bit. A check procedure
is defined to signal out if the output of the adder and the calculated result are different. The
tests show that both adders have similar operation and thus the modified EAC is verified.
3.4 Conclusion
In this chapter the proposed Parallel Prefix EAC adder is analyzed. It is shown that using
conditional sum blocks solves the carry incorporation problem existing in the previous works.
Moreover, the EAC logic in [6] is redesigned with a modified 2n − 1 adder to provide wider
design space and shorter carry path. As a final step the adder is implemented in Verilog and
validated through simulation.
In the next chapter, a simple method for transistor level realization of the critical path is
discussed.
Chapter 4
Critical Path Analysis
4.1 Path Identification
The critical path is identified as the path from the sign logic to the sum of at the output as shown
in Figure 4.1 for the proposed adder. In order to conditionally complement the second input,
the effective operation Os is calculated from sign inputs sA and sB. After Bt is calculated, the
propagate and generate terms are calculated using Eqn. 1.7. The half-sum block, hi = ai⊕ bi is
not on the critical path because the half-sum is needed for sum calculation which is not needed
for carry propagation. After 4 stages of 16bit adder, carry is merged with other group terms
in the 8bit EAC block. The EAC block consists of 3 carry merge stages and 1 carry increment
stage. We only take into account the carry merge operation Gi:k = Gi:j + Pi:j · Gj−1:k, which
is simply an AND-OR-INVERT(AOI) gate, because the group propagation term is not on the
critical path. The most significant carry should either be blocked or propagated. Thus, the
Os controlled AND gate is on the critical path. Since the end-around carry term, the most
significant bit in EAC block, is calculated before the carry increment stage, c8 is not on the
critical path. If we choose one of the carry in between the last and first bits, it gives a good
estimation about the delay of the operation. At the next stage one of the sums needs to be
selected according to the carry from EAC block as an output for each16bit adder. Therefore, one
mux is on the critical path. As a final stage, the output needs to be conditionally complemented
according to the end-around-carry and operation. The calculation of the condition is not on
30
CHAPTER 4. CRITICAL PATH ANALYSIS 31
critical path because during the first MUX stage, it can be calculated. Thus, we end up one
XOR stage for complementing.
Figure 4.1: Critical path of the modified EAC adder.
4.2 Path Design
After the critical path is identified, the gate level model is created as in Figure 4.2. All the
inverters on the critical path are removed to decrease the number of logic levels by applying
the inverting property of CMOS to the consecutive levels. It can be observed that each stage
of carry merge operation shows an alternating design of AOI and OAI. The output load is one
minimum size inverter.
CHAPTER 4. CRITICAL PATH ANALYSIS 32
Figure 4.2: Gate level design of critical path.
Gate level design is just a logical expression of the circuit. Therefore, in order to make
accurate simulations the gates should be mapped to transistor level models. In this phase,
we created a library of custom design gates instead of using standard cell library for higher
performance.
4.3 Transistor Level Design and Sizing
Full custom design always outperforms standard cell design in terms of power consumption, area
and propagation delay. However, it is hard to automate designing in full custom perspective and
the designing process takes much longer time than the design with standard cells. A number of
techniques exist when designing in transistor level.
4.3.1 Logic Level Minimization
Before designing any logic circuit, it is a beneficial step to exploit logic level minimization. In
our design we tried to combine the pg generation and first level of AOI as suggested in [8]. This
approach allows for a single stage to be removed from Weinberger adder realization. However,
this method increases the stack count and since our blocks were designed close to minimum
width transistors, we did not observe any improvement on the delay. Figure 4.3 shows the
reduced first stage.
CHAPTER 4. CRITICAL PATH ANALYSIS 33
Figure 4.3: Reduced first stage in Weinberger recursion adder [8].
As another step Ling recursion is applied to the first step. However as suggested in [15]
Ling carries must be converted to real carries by ANDing the most significant carry with the
corresponding propagate term, ci+1 = di · pi before leaving the adder block. This calculation is
on the critical path. Therefore, Ling recursion is not applied.
As a final step, we combined the Os controlled AND gate with the previous OAI gate.
However, we observed that increasing the complexity of the gate did not provide much delay
improvement because the gates are already sized close to minimum.
CHAPTER 4. CRITICAL PATH ANALYSIS 34
4.3.2 Late arriving signal exploitation
One of the most important perspective in transistor level design is to connect the late arriving
signals close to output. It can be shown on Figure 4.4 AOI gate , assuming A is the late arriving
signal, the gate on Figure 4.5 provides better delay.
Figure 4.4: AOI without late arriving exploitation.
CHAPTER 4. CRITICAL PATH ANALYSIS 35
Figure 4.5: AOI with late arriving exploitation.
4.3.3 Logical Effort
Sutherland et al. described a very useful concept of Logical Effort [46].The method of logical
effort is founded on a simple model of the delay through a single MOS logic gate. The model
describes delays caused by the capacitive load that the logic gate drives and by the topology
of the logic gate. Clearly as the load increases, the delay increases, but delay also depends on
the logic function of the gate. Inverters, the simplest logic gates, drive loads best and are often
used as amplifiers to drive large capacitances. Logic gates that compute other functions require
more transistors, some of which are connected in series, making them poorer that inverters at
driving current. A NAND gate has more delay than an inverter with similar transistor sizes
that drives the same load. The method of logical effort quantifies these effects to simplify delay
analysis for individual logic gates and multistage logic networks.
The logical effort of a logic gate tells how much worse it is at producing output current
than is an inverter, given that each of its inputs may present the same input capacitance as the
CHAPTER 4. CRITICAL PATH ANALYSIS 36
inverter. Reduced output current means slower operation, and thus the logical effort number for
a logic gate tells how much more slowly it will drive a load that would an inverter. Equivalently,
logical effort is how much more input capacitance a gate must present in order to deliver the
same output current as an inverter.
Logical Effort for Multistage Networks
The method of logical effort reveals the best number of stages in a multistage network and how
to obtain the least overall delay by balancing the delay among the stages. The notions of logical
and electrical effort generalize easily from individual gates to multistage paths.
The logical effort along a path compounds by multiplying the logical efforts of all the logic
gates along the path. The symbol G denotes the path logical effort, so that it is distinguished
from g, the logical effort of a single gate in the path. The subscript i indexes the logic states
along the path.
G = Πgi (4.1)
The electrical effort along a path through a network is simply the ratio of the capacitance
that loads the last logic gate in the path to the input capacitance of the first gate in the path.
The uppercase symbol H indicates the electrical effort along a path. In this case, Cin and Cout
refer to the input and output capacitances.
H = Cout/Cin (4.2)
Branching effort b is used to account for fanout within a network. When fanout occurs
within a logic network, some of the available drive current is directed along the analyzed path
and some is directed along the off-path. The branching effort b is defined at the output of a
logic to be
b = (Conpath + Coffpath)/Cin (4.3)
Note that if the path does not branch, the branching effort is one. The branching effort
along an entire path B is the product of the branching effort at each of the stages along the
CHAPTER 4. CRITICAL PATH ANALYSIS 37
path.
B = Πbi (4.4)
Utilizing the definitions of logical, electrical and branching effort along a path, path effort
is defined as F. Note that the path branching and electrical efforts are related to the electrical
effort of each stage.
F = G ·B ·H (4.5)
Although it is not a direct measure of delay along the path, the path effort holds the key
to minimizing the delay. Observe that the path effort depends only on the circuit topology and
loading and not upon the sizes of the transistors used in the logic gates embedded within the
network. The path effort is related to the minimum achievable delay along the path. Only a
little work yields to find the best number of stages and the proper transistor sizes to realize the
minimum delay.
Optimizing the design of an N-stage logic network proceeds from a very simple principle that
the path delay is least when each stage in the path bears the same stage effort. This minimum
delay is achieved when the stage effort is:
f = gi · hi = F 1/N (4.6)
To equalize the effort borne by each stage on a path, and therefore achieve the minimum
delay along the path, appropriate transistor sizes for each stage of logic along the path must be
chosen. Each logic stage should be designed with electrical effort:
hi = F 1/N/gi (4.7)
From this relationship, it is straightforward to determine the transistor sizes of gates along
a path. Starting at the end of the path and working backward to apply the capacitance trans-
formation:
Cin,i = (gi · Cout,i)/f (4.8)
CHAPTER 4. CRITICAL PATH ANALYSIS 38
The equation determines the input capacitance of each gate, which can then be distributed
appropriately among the transistors connected to the input.
4.3.4 Design with Helpers
As described in [47] in most of the aders the stage effort is generally constant if wire capaci-
tance is neglected. This means uniform gate sizes may be used throughout with little loss in
performance. It is possible to have a very regular layout in this case. However, such adders
like Sklansky show exponential fanout increase. In this case, the stage effort becomes high for
those cells and as a result a bigger driver is needed. Therefore, Harris et al. [47] proposes the
concept of helpers. When the stage effort increases, it is a good practice to duplicate driving
cells in parallel to maintain lower delay.
Figure 4.6: EAC logic with helper.
Using the concept described by [47], we duplicate the operation controlled NAND gate.
That helps to decrease the fanout of the carry increment stage and, consequently, the size of
the NAND gate.
4.4 Transistor Sizing
Logical effort found wide interest in transistor sizing [48–51]. In our proposed adder, we used
the concept to size the transistors in the critical path. After the sizes are determined, it is
CHAPTER 4. CRITICAL PATH ANALYSIS 39
possible to use these widths in the remaining blocks because other blocks which are not on
critical path will have the same loads and fan-outs within the same logic level.
In order to equalize the rising and falling edge of the output signal wp/wn = 2.5/1 ratio is
used between NMOS and PMOS transistors. In the transistor level library, late arriving signal
exploitation, helpers and logical effort are used to achive highest performance. In addition, the
Os controlled AND gate needs to drive 7 OAI cells. If we use two helpers, as described in 4.3.4,
instead of one, so that the gates drive 4 OAI at most and it is possible decrease the large size
as well as delay. In order to apply logical effort, a spreadsheet is created as in Figure 4.7. After
the logical effort parameters are calculated, the width is distributed to each stage according to
stage coefficients. Since the gates are designed according to 2.5/1 ratio, stage coefficients show
how large is the gate when compared to inverter.
Figure 4.7: Spreadsheet for Logical Effort Calculation.
After the appropriate transistor sizes are found, we continued on performing simulations
using HSPICE [52] and FreePDK45 [53] [54] library. Figure 4.8 and 4.9 show the transistor
level schematic.
CHAPTER 4. CRITICAL PATH ANALYSIS 40
Figure 4.8: Transistor level schematic.
Figure 4.9: Transistor level schematic.
CHAPTER 4. CRITICAL PATH ANALYSIS 41
4.5 Simulation Results
During the simulation a square wave signal is applied with 150ps rising and falling edges. The
delay between input and output is then measured. The power dissipation of the critical path
is measured for power estimation of one bit of the adder. Temperature and supply voltage is
swept between 25-100 ◦C and 0.8-1.2Vdd to observe optimal performance.
It can be observed from Figures 4.10 - 4.13 that critical path shows 141.8ps delay and
18.9µW power dissipation for one bit, when 10 % switching activity is considered [3], which
makes 2.42mW for 128bits.
Figure 4.10: Delay vs. Vdd at 25 ◦C.
CHAPTER 4. CRITICAL PATH ANALYSIS 42
Figure 4.11: Power vs. Vdd at 25 ◦C.
Figure 4.12: Delay vs. Vdd at 100 ◦C.
CHAPTER 4. CRITICAL PATH ANALYSIS 43
Table 4.1: Delay and Power dissipation values in correspondence to VDD and Temperature.
Vdd Delay(ps) Power(µW )
Temp25 ◦C 100 ◦C 25 ◦C 100 ◦C
0.8 178.18 269.08 10.33 11.551.0 141.83 214.94 18.90 20.591.2 125.45 189.31 33.79 36.19
Figure 4.13: Power vs. Vdd at 100 ◦C.
Table 4.1 shows the trade-off between temperature and supply voltage for the proposed
adder. When Vdd is increased from 1.0V to 1.2V delay decreases by 12%, power increases by
79% and when decreased from 1.0V to 0.8V, delay increases by 26% and power decreases by 45%.
When temperature changes from 25◦C to 100◦C delay increases by 51% and power increases by
9.2%. It can be observed that much more power can be saved than the increase in delay for low
power applications by lowering the supply voltage. However, for high speed applications more
power should be provided. Since the temperature of the circuit has a reverse effect on both
power and delay, the circuit should remain at lower temperature for higher performance.
CHAPTER 4. CRITICAL PATH ANALYSIS 44
4.6 Conclusion
In this chapter the methodology for transistor level critical path optimization is provided. First,
the critical path of the adder is identified taking into account the logic levels. Transistor are sized
with the Logical Effort concept [46]. Architecture level and transistor level optimizations are
applied according to full custom design rules in FreePDK45 library [54]. As a final step, transient
simulations are carried out to calculate the delay and power dissipation. It is observed that the
method enables fast design for comparison of the path among different adder architectures in
the critical delay and power dissipation perspective.
In the next chapter, a methodology for fast area estimation and layout entry is given.
Datapath library is generated exploiting the regularity for the proposed adder.
Chapter 5
Datapath Library
5.1 Introduction
Datapaths have been a research topic in the last years, as an approach to make the layout entry
faster [9] [10] [55] [56]. Informally datapaths are circuits where the same or similar logic is
applied to several bits [57]. A datapath stack [58] is made up of many custom word lines such
as registers, ALU, adders, shifters, multiplexers, buffers to form the data flow of the functional
units.
Datapaths are characterized by a highly regular layout structure. A typical datapath floor-
plan consists of an array of horizontally oriented words of identical bit cells, called datapath
cells, and vertically oriented bit slices as shown on Figure 5.1. Since each bit slice is replicated a
number of times(determined by the datapath width) with very little or no modification, layout
generation of such regular structures reduces to a careful design, often by means of handcraft-
ing, of individual datapath cells. Figure 5.2 shows the regularity placement and routing inside
a datapath slice. Figure 5.3 shows the schematic layout of datapath and corresponding bit
slice [55].
45
CHAPTER 5. DATAPATH LIBRARY 46
Figure 5.1: Global floorplan of a datapath [9].
Figure 5.2: Regularity placement and routing datapath circuit [10].
CHAPTER 5. DATAPATH LIBRARY 47
Figure 5.3: Schematic layout of datapath and detailed view of bitcell [11].
Datapath circuits are typically organized in horizontal rows of words representing the same
functional block and vertical bit slices, delimited by vertically running power and ground rails.
The layout of the datapath cell of bit slice i is identical to that of bit slice (i+1), but mirrored
along the vertical axis so that the adjacent bit slices can share common power or ground rail.
CHAPTER 5. DATAPATH LIBRARY 48
Figure 5.4: Representation of a datapath cell [9].
The width of the bit slice, also known as a pitch is fixed; it determines the width for all the
datapath cells as outlined on Figure 5.4. Power and ground (VDD/VSS) supply rails generally
delimit the pitch. Signal nets are connected to the datapath cell components by means of
bristles. Vertical bristles, or data lines, provide wiring between different within the same bit
slice. They run in parallel to the power rails. Horizontal bristles or control lines provide wiring
between datapath cells of different bit slices. Control lines span the width of the datapath
run perpendicular to the power rails. Since adders exploit very regular structures, they can be
designed with datapaths.
5.2 Concepts in Full Custom Design
Transistor chaining and device merging Transistor chaining is a widely used technique to
improve both area and performance of datapath cells. Several transistors can be chained
together by combining their diffusion areas in order to reduce the diffusion capacitance.
Diffusion sharing applied to simple logic gates in the same datapath cell is known as device
merging.
Transistor folding Transistor folding is another popular technique aimed at minimizing area
CHAPTER 5. DATAPATH LIBRARY 49
and improving performance of custom designs. The folding changes the aspect ratio of
the component while maintaining the required device size (W/L ratio). By performing
folding with different number of fingers(poly gates), different component instances can be
created for the placement phase.
Intracell sharing Two component areas(diffusion regions or poly gates) belonging to compo-
nents from adjacent bit slices can be merged if they share the same global net, such as
power line, control line or clock signal. In a typical organization of a datapath, adjacent
bit slices are identical copies of each other, reflected with respect to the vertical boundary
line. In this case the components can be pushed under the boundary line (ground or power
rail) to create a more compact layout.
5.3 Datapath Design
Using the concepts defined in sections 5.2 and 4.3 the logic gates are designed. Figure 5.5(a)
and 5.5(b) shows the design of basic AOI and MUX2 cells according to the rules. AOI cell
occupies 0.77µm× 1.5µm area and MUX2 0.97µm× 1.5µm.
(a) AOI Layout (b) MUX2 Layout
Figure 5.5: Designed basic cells.
The longest cell according to the values from Figure 4.7 has a 1.5µm height. Thus, the other
CHAPTER 5. DATAPATH LIBRARY 50
cells are designed according to the longest cell to maintain regularity and to exploit datapath.
Figure 5.7 shows the bitslice of 16bit Kogge-Stone adder. Using small cells allowed us to reduce
the height of the bit slice. The bitslice includes the sign logic and conditional sum calculation.
It can be seen from the datapath that the carry merge and propagate cells are interchanging
one stage after another due to the intrinsic negation of CMOS design.
(a) EAC bitslice (b) EAC last bit
Figure 5.6: Bit slices of the blocks in the adder.
CHAPTER 5. DATAPATH LIBRARY 51
Figure 5.7: Bit slice of 16bit Kogge-Stone adder.
CHAPTER 5. DATAPATH LIBRARY 52
Using the same approach in Figure 5.7, bitslices for the last and regular bits of parallel prefix
EAC block can be designed. Since the last bit has a irregular layout it is designed separately.
Figure 5.6(a) and 5.6(b) show the designed bitslices for EAC block.
5.4 Layout Design
Using the datapath designs, it is straightforward to design the whole adder. Replicating the
bitslice as designed in Figure 5.7 gives the layout of 16 bit Kogge-Stone adder. Figure 5.8(a)
shows the layout of 16 bit adder. 16 bit adder and EAC block are sized 13.42µm × 21.98µm
and 5.18µm× 11.06µm respectively.
As a final step, using the designed blocks, it is possible to estimate the total area of 128 bit
EAC adder. Two different schemes are considered. Figure 5.9 has a thinner but wider layout
to be used as a part of larger datapath. Figure 5.10 has a thicker but tighter layout to have a
more compact layout because half of the adder is stacked on top of the other. First one and the
second are sized 20.8µm× 175µm and 35.8µm× 87.5µm respectively.
Figure 5.9: Wide layout.
Figure 5.10: Stacked compact layout.
CHAPTER 5. DATAPATH LIBRARY 53
Table 5.1: Results comparison of proposed adder with the previous work.Delay(ps) Power(mW) Area(µm2) Technology
[5] 200(+29%) - - 65nm[12] 270(+47%) 20(+88%) 17,237(+82%) 65nmProposed 142 2.42 3,132 45nm
5.5 Results
The results of previous works and the proposed adder can be observed on Table 5.1. In previous
works a pipelined adder with 128bit 5GHz+ Binary Floating Point adder [5] is proposed and
in [12] a 108bit EAC adder is proposed. It is not possible to compare with both of the adders
because they implement the adders as a part of FMA unit and the adders are designed in
65nm IBM SOI technology, which is not publicly available. The adder designed in [5] is a
pipelined adder which does not fall into the scope of this thesis. The adder in [6] is designed
in FPGA technology, however, our adder is full custom design. Therefore, to the best of our
knowledge this is the first adder to use small Parallel Prefix 2n − 1 EAC block with the full
custom design methodology. It can be seen on Table 5.1 that the adder operates at the delay
of 142ps and 2.42mW power dissipation in 3, 132µm2 area under 25◦C with 1V supply. The
proposed adder shows up to 47% improvement in delay, 81% improvement in area and 88%
improvement in power dissipation assuming routing does not majorly contribute to delay and
power in comparison with the previous works.
5.6 Conclusion
In this final chapter a detailed analysis about layout entry through datapath design is provided.
The comparison between the proposed and the previous work is included. Datapath library
provides building blocks for early area estimation before designing the whole adder. Further,
the routing is simplified with datapath since wires can be placed regularly in the bitslice. The
cells are designed using full custom design rules to achieve compact, area efficient layout. Two
different layouts are generated by repeating the bitslices. We aimed to have the longest carry
path under 100µm assuming the routing does not majorly contribute to delay. Final result
CHAPTER 5. DATAPATH LIBRARY 54
shows that, in the stacked layout, longest path remains under 45nm and the proposed adder
shows up to 47% improvement in delay, 81% improvement in area and 88% improvement in
power dissipation in 45nm technology under 25◦C and 1.0Vdd.
CHAPTER 5. DATAPATH LIBRARY 55
(a) Layout of 16bit Kogge-Stone adder (b) Layout of parallel prefix EAC block
Figure 5.8: Layout of the blocks.
Chapter 6
Conclusion and Future Works
Adders are the functional blocks which are generally designed for faster operation. However,
power dissipation cannot have less priority anymore. Using parallel prefix adders is a good design
practice for trading-off between speed, power dissipation and area. It is observed in the literature
that it is not possible to have higher gains anymore from the designed circuits in the performance
perspective. However, the design space is very vast and there always exists possibilities for
improvements. In this thesis, we designed a parallel prefix 2n − 1 based adder to show it is
possible to shorten the critical path and power dissipation. After the adder is implemented in
Verilog, we performed a critical path analysis. In this top down design perspective, we preferred
full custom design to achieve the best performance rather that standard cell design. It is well
known that analysis and design in full custom design methodology requires large amount of time.
Thus, specifying the critical path provides fast analysis without designing the whole circuit. In
order to make such analysis we assumed that wire delay does not contribute to the majority
of the total propagation delay. This analysis shows that if the critical path is optimized, the
total performance is optimized. As a next step, we created a datapath library using the results
from critical path analysis. It is shown that datapath design reduces the complexity of adder
design process because it exploits the regularity as bit slice. As a general outcome, critical
path analysis and datapath design provides fast analysis for comparison among different adder
architectures in speed, power dissipation and area perspectives. Moreover, once the bit slice is
designed optimally, it can be used as a building block for fast layout entry. After the datapath
56
CHAPTER 6. CONCLUSION AND FUTURE WORKS 57
design, it is observed in the final layout that the length of the end-around carry path is decreased
to less than 45µm, using a stacked layout, which confirms our assumption about routing delay.
With our assumption, the adder shows 142ps delay, 2.42mW power dissipation and under 3200
sq. micron area.
Our analysis is based on Static CMOS design. Recent works show that dynamic adders
provides good performance results. Although the power dissipation increases in dynamic design,
a trade-off can be achieved between speed and power. Moreover, pipelined circuits found interest
in the adder design. The operation of this adder can be further analyzed in dynamic design
space. In addition, the use of EAC adders in floating point units especially decimal floating
point adders can be further investigated.
Bibliography
[1] J. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits, 2/e. PearsonEducation, 2003.
[2] I. Koren, Computer Arithmetic Algorithms, ser. Ak Peters Series. A K Peters, 2002.
[3] N. Weste and D. Harris, CMOS VLSI design: a circuits and systems perspective.Pearson/Addison-Wesley, 2005.
[4] R. Zimmermann, “Efficient vlsi implementation of modulo (2n± 1) addition and multipli-cation,” in Computer Arithmetic, 1999. Proceedings. 14th IEEE Symposium on, 1999, pp.158 –167.
[5] X. Y. Yu, Y. hing Chan, M. Kelly, and S. B. Curran, “A 5ghz+ 128-bit binary floating-pointadder for the power6,” in Power6 Processor, Proc. of ESSCIRC, 2006, pp. 166–169.
[6] F. Liu, Q. Tan, G. Chen, X. Song, O. Ait Mohamed, and M. Gu, “Field programmablegate array prototyping of end-around carry parallel prefix tree architectures,” ComputersDigital Techniques, IET, vol. 4, no. 4, pp. 306 –316, July 2010.
[7] G. Dimitrakopoulos and D. Nikolos, “High-speed parallel-prefix vlsi ling adders,” Comput-ers, IEEE Transactions on, vol. 54, no. 2, pp. 225 – 231, Feb. 2005.
[8] B. Zeydel, D. Baran, and V. Oklobdzija, “Energy-efficient design methodologies: High-performance vlsi adders,” Solid-State Circuits, IEEE Journal of, vol. 45, no. 6, pp. 1220–1233, June 2010.
[9] M. Ciesielski, S. Askar, and S. Levitin, “Analytical approach to layout generation of datap-ath cells,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactionson, vol. 21, no. 12, pp. 1480 – 1488, Dec 2002.
[10] T. Tao Ye and G. De Micheli, “Data path placement with regularity,” in Computer AidedDesign, 2000. ICCAD-2000. IEEE/ACM International Conference on, 2000, pp. 264 –270.
[11] J.-S. Yim and C.-M. Kyung, “Datapath layout optimisation using genetic algorithm andsimulated annealing,” Computers and Digital Techniques, IEE Proceedings -, vol. 145, no. 2,pp. 135 –141, Mar 1998.
[12] X. Y. Zhang, Y.-H. Chan, R. K. Montoye, L. J. Sigal, E. M. Schwarz, and M. Kelly, “A270ps 20mw 108-bit end-around carry adder for multiply-add fused floating point unit,”Journal of Signal Processing Systems, vol. 58, pp. 139–144, 2010.
58
BIBLIOGRAPHY 59
[13] J. Bruguera and T. Lang, “Floating-point fused multiply-add: reduced latency for floating-point addition,” in Computer Arithmetic, 2005. ARITH-17 2005. 17th IEEE Symposiumon, June 2005, pp. 42 – 51.
[14] D. Harris and S. Harris, Digital design and computer architecture, ser. Morgan Kaufmann.Morgan Kaufmann Publishers, 2007.
[15] J. Chen and J. Stine, “Parallel prefix ling structures for modulo 2n − 1 addition,” inApplication-specific Systems, Architectures and Processors, 2009. ASAP 2009. 20th IEEEInternational Conference on, July 2009, pp. 16 –23.
[16] G. Chen and F. Liu, “Proofs of correctness and properties of integer adder circuits,” Com-puters, IEEE Transactions on, vol. 59, no. 1, pp. 134 –136, Jan. 2010.
[17] J. Park, H. Ngo, J. Silberman, and S. Dhong, “470 ps 64-bit parallel binary adder [for cpuchip],” in VLSI Circuits, 2000. Digest of Technical Papers. 2000 Symposium on, 2000, pp.192 –193.
[18] D. Patil, O. Azizi, M. Horowitz, R. Ho, and R. Ananthraman, “Robust energy-efficientadder topologies,” in Computer Arithmetic, 2007. ARITH ’07. 18th IEEE Symposium on,June 2007, pp. 16 –28.
[19] G. Dimitrakopoulos, P. Kolovos, P. Kalogerakis, and D. Nikolos, “Design of high-speedlow-power parallel-prefix vlsi adders.” in PATMOS’04, 2004, pp. 248–257.
[20] A. Weinberger and J. Smith, “A logic for high-speed addition,” National Bureau of Stan-dards, vol. no. Circulation 591, pp. 3 – 12, 1958.
[21] D. Harris, “A taxonomy of parallel prefix networks,” in Signals, Systems and Computers,2003. Conference Record of the Thirty-Seventh Asilomar Conference on, vol. 2, Nov. 2003,pp. 2213 – 2217 Vol.2.
[22] P. M. Kogge and H. S. Stone, “A parallel algorithm for the efficient solution of a generalclass of recurrence equations,” Computers, IEEE Transactions on, vol. C-22, no. 8, pp. 786–793, Aug. 1973.
[23] J. Sklansky, “Conditional-sum addition logic,” Electronic Computers, IRE Transactionson, vol. EC-9, no. 2, pp. 226 –231, June 1960.
[24] R. Brent and H. Kung, “A regular layout for parallel adders,” Computers, IEEE Transac-tions on, vol. C-31, no. 3, pp. 260 –264, March 1982.
[25] T. Han, D. A. Carlson, and T. don Han, “Fast area-efficient vlsi adders,” in IEEE Sympo-sium on Computer Arithmetic, 1987.
[26] S. Knowles, “A family of adders,” in Computer Arithmetic, 2001. Proceedings. 15th IEEESymposium on, 2001, pp. 277 –281.
[27] R. E. Ladner and M. J. Fischer, “Parallel prefix computation,” Journal of The ACM,vol. 27, pp. 831–838, 1980.
BIBLIOGRAPHY 60
[28] B. Zeydel, T. Kluter, and V. Oklobdzija, “Efficient mapping of addition recurrence algo-rithms in cmos,” in Computer Arithmetic, 2005. ARITH-17 2005. 17th IEEE Symposiumon, June 2005, pp. 107 – 113.
[29] R. Doran, “Variants of an improved carry look-ahead adder,” Computers, IEEE Transac-tions on, vol. 37, no. 9, pp. 1110 –1113, Sep 1988.
[30] H. Ling, “High-speed binary adder,” IBM Journal of Research and Development, vol. 25,no. 3, pp. 156 –166, March 1981.
[31] R. I. Tanaka, “Residue arithmetic and its applications to computer technology,” 1967.
[32] W. Jenkins and B. Leon, “The use of residue number systems in the design of finite impulseresponse digital filters,” Circuits and Systems, IEEE Transactions on, vol. 24, no. 4, pp.191 – 201, Apr 1977.
[33] X. Lai and J. L. Massey, “A proposal for a new block encryption standard.” Springer-Verlag, 1991, pp. 389–404.
[34] S.-S. Yau and Y.-C. Liu, “Error correction in redundant residue number systems,” Com-puters, IEEE Transactions on, vol. C-22, no. 1, pp. 5 – 11, Jan. 1973.
[35] F. Halsall, Data communications, computer networks and open systems (4th ed.). RedwoodCity, CA, USA: Addison Wesley Longman Publishing Co., Inc., 1995.
[36] V. Paliouras and T. Stouraitis, “Novel high-radix residue number system multipliers andadders,” in Circuits and Systems, 1999. ISCAS ’99. Proceedings of the 1999 IEEE Inter-national Symposium on, vol. 1, Jul 1999, pp. 451 –454 vol.1.
[37] C. Efstathiou, H. Vergos, and D. Nikolos, “Modulo 2n ± 1 adder design using select-prefixblocks,” Computers, IEEE Transactions on, vol. 52, no. 11, pp. 1399 – 1406, Nov. 2003.
[38] L. Kalampoukas, D. Nikolos, C. Efstathiou, H. Vergos, and J. Kalamatianos, “High-speedparallel-prefix module 2n-1 adders,” Computers, IEEE Transactions on, vol. 49, no. 7, pp.673 –680, Jul 2000.
[39] A. Beaumont-Smith and C.-C. Lim, “Parallel prefix adder design,” in Computer Arithmetic,2001. Proceedings. 15th IEEE Symposium on, 2001, pp. 218 –225.
[40] E. M. Schwarz, High-performance energy-efficient microprocessor design, ser. Series onintegrated circuits and systems. Springer, 2006, ch. Binary floating-point unit design.
[41] J. Shedletsky, “Comment on the sequential and indeterminate behavior of an end-around-carry adder,” Computers, IEEE Transactions on, vol. C-26, no. 3, pp. 271 –272, March1977.
[42] F. Liu, X. Song, Q. Tan, and G. Chen, “Formal analysis of end-around-carry adder infloating-point unit,” Computer-Aided Design of Integrated Circuits and Systems, IEEETransactions on, vol. 29, no. 10, pp. 1655 –1659, Oct. 2010.
[43] A. Goldovsky, R. Kolagotla, C. Nicol, and M. Besz, “A 1.0-nsec 32-bit prefix tree adderin 0.25- mu;m static cmos,” in Circuits and Systems, 1999. 42nd Midwest Symposium on,vol. 2, 1999, pp. 608 –612 vol. 2.
BIBLIOGRAPHY 61
[44] R. Muralidharan and C.-H. Chang, “Hard multiple generator for higher radix modulo 2n-1multiplication,” in Integrated Circuits, ISIC ’09. Proceedings of the 2009 12th InternationalSymposium on, Dec. 2009, pp. 546 –549.
[45] L. Kalampoukas, D. Nikolos, C. Efstathiou, H. Vergos, and J. Kalamatianos, “High-speedparallel-prefix module 2n-1 adders,” Computers, IEEE Transactions on, vol. 49, no. 7, pp.673 –680, Jul 2000.
[46] I. Sutherland, R. Sproull, and D. Harris, Logical effort: designing fast CMOS circuits, ser.The Morgan Kaufmann Series in Computer Architecture and Design. Morgan KaufmannPublishers, 1999.
[47] D. Harris and I. Sutherland, “Logical effort of carry propagate adders,” in Signals, Systemsand Computers, 2003. Conference Record of the Thirty-Seventh Asilomar Conference on,vol. 1, Nov. 2003, pp. 873 – 878 Vol.1.
[48] A. Kabbani, D. Al-Khalili, and A. Al-Khalili, “Logical path delay distribution and tran-sistor sizing,” in IEEE-NEWCAS Conference, 2005. The 3rd International, June 2005, pp.391 – 394.
[49] V. Oklobdzija, B. Zeydel, H. Dao, S. Mathew, and R. Krishnamurthy, “Energy-delay es-timation technique for high-performance microprocessor vlsi adders,” in Computer Arith-metic, 2003. Proceedings. 16th IEEE Symposium on, June 2003, pp. 272 – 279.
[50] F. Frustaci, M. Lanuzza, P. Zicari, S. Perri, and P. Corsonello, “Designing high-speedadders in power-constrained environments,” Circuits and Systems II: Express Briefs, IEEETransactions on, vol. 56, no. 2, pp. 172 –176, Feb. 2009.
[51] R. Zlatanovici, S. Kao, and B. Nikolic, “Energy-delay optimization of 64-bit carry-lookahead adders with a 240 ps 90 nm cmos design example,” Solid-State Circuits, IEEEJournal of, vol. 44, no. 2, pp. 569 –583, Feb. 2009.
[52] HSPICE, “The gold standard for accurate circuit simulation,” http://www.synopsys.com/Tools/Verification/AMSVerification/CircuitSimulation/HSPICE/Pages/default.aspx.
[53] J. Stine, I. Castellanos, M. Wood, J. Henson, F. Love, W. Davis, P. Franzon, M. Bucher,S. Basavarajaiah, J. Oh, and R. Jenkal, “Freepdk: An open-source variation-aware designkit,” in Microelectronic Systems Education, 2007. MSE ’07. IEEE International Conferenceon, June 2007, pp. 173 –174.
[54] NCSU, “45nm variant of the FreePDK process design kit,” http://www.eda.ncsu.edu/wiki/FreePDK45:Contents.
[55] T. Jing, X.-L. Hong, Y.-C. Cai, J.-Y. Xu, C.-Q. Yang, Y.-Q. Zhang, Q. Zhou, and W. Wu,“Data-path layout design inside soc,” in Communications, Circuits and Systems and WestSino Expositions, IEEE 2002 International Conference on, vol. 2, June-1 July 2002, pp.1406 – 1410 vol.2.
[56] W. Daily and A. Chang, “The role of custom design in asic chips,” in Design AutomationConference, 2000. Proceedings 2000. 37th, 2000, pp. 643 –647.
BIBLIOGRAPHY 62
[57] N. H. E. Weste and K. Eshraghian, “Principles of cmos vlsi design: a systems perspective,”1993.
[58] W. Luk and A. Dean, “Multistack optimization for data-path chip layout,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 10, no. 1, pp.116 –129, Jan 1991.
Appendix A
Verilog Code of the Proposed Adder
module Mod_2n_1_128b_temp(
input [127:0] A,
input sA,
input [127:0] B,
input sB,
output [127:0] S,
output sS
);
wire [127:0] iB, iS;
wire Os;
wire [7:0] c,p,g;
xor X1 (Os, sA, sB);
assign iB = Os ? ~B : B;
// 16b front and 8b EAC
Mod2n_1_16b_wCI_CS_KS i0 (.A(A[15:0]), .B(iB[15:0]), .cin(c[7]), .pg({p[0],g[0]}), .S(iS[15:0]));
Mod2n_1_16b_wCI_CS_KS i1 (.A(A[31:16]), .B(iB[31:16]), .cin(c[0]), .pg({p[1],g[1]}), .S(iS[31:16]));
Mod2n_1_16b_wCI_CS_KS i2 (.A(A[47:32]), .B(iB[47:32]), .cin(c[1]), .pg({p[2],g[2]}), .S(iS[47:32]));
Mod2n_1_16b_wCI_CS_KS i3 (.A(A[63:48]), .B(iB[63:48]), .cin(c[2]), .pg({p[3],g[3]}), .S(iS[63:48]));
Mod2n_1_16b_wCI_CS_KS i4 (.A(A[79:64]), .B(iB[79:64]), .cin(c[3]), .pg({p[4],g[4]}), .S(iS[79:64]));
Mod2n_1_16b_wCI_CS_KS i5 (.A(A[95:80]), .B(iB[95:80]), .cin(c[4]), .pg({p[5],g[5]}), .S(iS[95:80]));
Mod2n_1_16b_wCI_CS_KS i6 (.A(A[111:96]), .B(iB[111:96]), .cin(c[5]), .pg({p[6],g[6]}), .S(iS[111:96]));
Mod2n_1_16b_wCI_CS_KS i7 (.A(A[127:112]), .B(iB[127:112]), .cin(c[6]), .pg({p[7],g[7]}), .S(iS[127:112]));
// 8b EAC logic
Mod2n_1_8b_wo_pg_KS EAC (.p(p[7:0]), .p1(p1[7:0]), .g(g[7:0]), .c(c[7:0]), .Os(Os));
assign S = Os&(c[7]) ? iS : ~iS;
assign sS = c[7] ? sA : sB;
endmodule
////////////////////////////////////////////////////////////////////////////////////////////////////////////
module Mod2n_1_16b_wCI_CS_KS(
input [15:0] A,
input [15:0] B,
input cin,
output p1,
output [1:0] pg,
output [15:0] S
);
wire [1:0] r1c7, r1c6, r1c5, r1c4, r1c3, r1c2, r1c1, r1c0;
wire [1:0] r1c15, r1c14, r1c13, r1c12, r1c11, r1c10, r1c9, r1c8;
63
APPENDIX A. VERILOG CODE OF THE PROPOSED ADDER 64
pg16 ipg16(.A(A), .B(B),
.pg15(r1c15),.pg14(r1c14),.pg13(r1c13),.pg12(r1c12),
.pg11(r1c11),.pg10(r1c10),.pg9(r1c9),.pg8(r1c8),
.pg7(r1c7),.pg6(r1c6),.pg5(r1c5),.pg4(r1c4),
.pg3(r1c3),.pg2(r1c2),.pg1(r1c1),.pg0(r1c0));
wire [1:0] r2c15, r2c13, r2c11, r2c9, r2c7, r2c5, r2c3, r2c1;
wire [1:0] r2c14, r2c12, r2c10, r2c8, r2c6, r2c4, r2c2;
black ir1c15(.pg(r1c15), .pg0(r1c14), .pgo(r2c15));
black ir1c14(.pg(r1c14), .pg0(r1c13), .pgo(r2c14));
black ir1c13(.pg(r1c13), .pg0(r1c12), .pgo(r2c13));
black ir1c12(.pg(r1c12), .pg0(r1c11), .pgo(r2c12));
black ir1c11(.pg(r1c11), .pg0(r1c10), .pgo(r2c11));
black ir1c10(.pg(r1c10), .pg0(r1c9), .pgo(r2c10));
black ir1c9(.pg(r1c9), .pg0(r1c8), .pgo(r2c9));
black ir1c8(.pg(r1c8), .pg0(r1c7), .pgo(r2c8));
black ir1c7(.pg(r1c7), .pg0(r1c6), .pgo(r2c7));
black ir1c6(.pg(r1c6), .pg0(r1c5), .pgo(r2c6));
black ir1c5(.pg(r1c5), .pg0(r1c4), .pgo(r2c5));
black ir1c4(.pg(r1c4), .pg0(r1c3), .pgo(r2c4));
black ir1c3(.pg(r1c3), .pg0(r1c2), .pgo(r2c3));
black ir1c2(.pg(r1c2), .pg0(r1c1), .pgo(r2c2));
black ir1c1(.pg(r1c1), .pg0(r1c0), .pgo(r2c1));
wire [1:0] r3c15, r3c14, r3c11, r3c10, r3c7, r3c6, r3c3, r3c2;
wire [1:0] r3c13, r3c12, r3c9, r3c8, r3c5, r3c4;
black ir2c15(.pg(r2c15), .pg0(r2c13), .pgo(r3c15));
black ir2c14(.pg(r2c14), .pg0(r2c12), .pgo(r3c14));
black ir2c13(.pg(r2c13), .pg0(r2c11), .pgo(r3c13));
black ir2c12(.pg(r2c12), .pg0(r2c10), .pgo(r3c12));
black ir2c11(.pg(r2c11), .pg0(r2c9), .pgo(r3c11));
black ir2c10(.pg(r2c10), .pg0(r2c8), .pgo(r3c10));
black ir2c9(.pg(r2c9), .pg0(r2c7), .pgo(r3c9));
black ir2c8(.pg(r2c8), .pg0(r2c6), .pgo(r3c8));
black ir2c7(.pg(r2c7), .pg0(r2c5), .pgo(r3c7));
black ir2c6(.pg(r2c6), .pg0(r2c4), .pgo(r3c6));
black ir2c5(.pg(r2c5), .pg0(r2c3), .pgo(r3c5));
black ir2c4(.pg(r2c4), .pg0(r2c2), .pgo(r3c4));
black ir2c3(.pg(r2c3), .pg0(r2c1), .pgo(r3c3));
black ir2c2(.pg(r2c2), .pg0(r1c0), .pgo(r3c2));
wire [1:0] r4c15, r4c14, r4c13, r4c12, r4c11, r4c10, r4c9, r4c8;
wire [1:0] r4c7, r4c6, r4c5, r4c4;
black ir3c15(.pg(r3c15), .pg0(r3c11), .pgo(r4c15));
black ir3c14(.pg(r3c14), .pg0(r3c10), .pgo(r4c14));
black ir3c13(.pg(r3c13), .pg0(r3c9), .pgo(r4c13));
black ir3c12(.pg(r3c12), .pg0(r3c8), .pgo(r4c12));
black ir3c11(.pg(r3c11), .pg0(r3c7), .pgo(r4c11));
black ir3c10(.pg(r3c10), .pg0(r3c6), .pgo(r4c10));
black ir3c9(.pg(r3c9), .pg0(r3c5), .pgo(r4c9));
black ir3c8(.pg(r3c8), .pg0(r3c4), .pgo(r4c8));
black ir3c7(.pg(r3c7), .pg0(r3c3), .pgo(r4c7));
black ir3c6(.pg(r3c6), .pg0(r3c2), .pgo(r4c6));
APPENDIX A. VERILOG CODE OF THE PROPOSED ADDER 65
black ir3c5(.pg(r3c5), .pg0(r2c1), .pgo(r4c5));
black ir3c4(.pg(r3c4), .pg0(r1c0), .pgo(r4c4));
wire [1:0] r5c15, r5c14, r5c13, r5c12, r5c11, r5c10, r5c9, r5c8;
black ir4c15(.pg(r4c15), .pg0(r4c7), .pgo(r5c15));
black ir4c14(.pg(r4c14), .pg0(r4c6), .pgo(r5c14));
black ir4c13(.pg(r4c13), .pg0(r4c5), .pgo(r5c13));
black ir4c12(.pg(r4c12), .pg0(r4c4), .pgo(r5c12));
black ir4c7(.pg(r4c11), .pg0(r3c3), .pgo(r5c11));
black ir4c6(.pg(r4c10), .pg0(r3c2), .pgo(r5c10));
black ir4c5(.pg(r4c9), .pg0(r2c1), .pgo(r5c9));
black ir4c4(.pg(r4c8), .pg0(r1c0), .pgo(r5c8));
assign pg = r5c15;
wire [15:0] r6c0, r6c1;
Carry_Inc CIA0(.cin(1’b0),.c7(r4c7),.c6(r4c6),.c5(r4c5),.c4(r4c4),
.c3(r3c3),.c2(r3c2),.c1(r2c1),.c0(r1c0),.r1c(r6c0[7:0]));
Carry_Inc CIA1(.cin(1’b1),.c7(r4c7),.c6(r4c6),.c5(r4c5),.c4(r4c4),
.c3(r3c3),.c2(r3c2),.c1(r2c1),.c0(r1c0),.r1c(r6c1[7:0]));
Carry_Inc CIB0(.cin(1’b0),.c7(r5c15),.c6(r5c14),.c5(r5c13),.c4(r5c12),
.c3(r5c11),.c2(r5c10),.c1(r5c9),.c0(r5c8),.r1c(r6c0[15:8]));
Carry_Inc CIB1(.cin(1’b1),.c7(r5c15),.c6(r5c14),.c5(r5c13),.c4(r5c12),
.c3(r5c11),.c2(r5c10),.c1(r5c9),.c0(r5c8),.r1c(r6c1[15:8]));
wire [15:0] S1,S0;
assign S0= {r6c0[14:0],1’b0} ^
{r1c15[1],r1c14[1],r1c13[1],r1c12[1],r1c11[1],r1c10[1],r1c9[1],r1c8[1],
r1c7[1],r1c6[1],r1c5[1],r1c4[1],r1c3[1],r1c2[1],r1c1[1],r1c0[1]};
assign S1= {r6c1[14:0],1’b1} ^
{r1c15[1],r1c14[1],r1c13[1],r1c12[1],r1c11[1],r1c10[1],r1c9[1],r1c8[1],
r1c7[1],r1c6[1],r1c5[1],r1c4[1],r1c3[1],r1c2[1],r1c1[1],r1c0[1]};
assign S = cin ? S1 : S0 ;
endmodule
////////////////////////////////////////////////////////////////////////////////////////////////////////////
module pg16 (A, B, pg15, pg14, pg13, pg12, pg11, pg10, pg9, pg8, pg7, pg6, pg5, pg4, pg3, pg2, pg1, pg0);
input [15:0] A, B;
output [1:0] pg15, pg14, pg13, pg12, pg11, pg10, pg9, pg8, pg7, pg6, pg5, pg4, pg3, pg2, pg1, pg0;
assign pg15 = {(A[15] ^ B[15]), (A[15] & B[15])};
assign pg14 = {(A[14] ^ B[14]), (A[14] & B[14])};
assign pg13 = {(A[13] ^ B[13]), (A[13] & B[13])};
assign pg12 = {(A[12] ^ B[12]), (A[12] & B[12])};
assign pg11 = {(A[11] ^ B[11]), (A[11] & B[11])};
assign pg10 = {(A[10] ^ B[10]), (A[10] & B[10])};
assign pg9 = {(A[9] ^ B[9]), (A[9] & B[9])};
assign pg8 = {(A[8] ^ B[8]), (A[8] & B[8])};
assign pg7 = {(A[7] ^ B[7]), (A[7] & B[7])};
assign pg6 = {(A[6] ^ B[6]), (A[6] & B[6])};
assign pg5 = {(A[5] ^ B[5]), (A[5] & B[5])};
assign pg4 = {(A[4] ^ B[4]), (A[4] & B[4])};
assign pg3 = {(A[3] ^ B[3]), (A[3] & B[3])};
assign pg2 = {(A[2] ^ B[2]), (A[2] & B[2])};
assign pg1 = {(A[1] ^ B[1]), (A[1] & B[1])};
assign pg0 = {(A[0] ^ B[0]), (A[0] & B[0])};
APPENDIX A. VERILOG CODE OF THE PROPOSED ADDER 66
endmodule
////////////////////////////////////////////////////////////////////////////////////////////////////////////
module black (pg, pg0, pgo);
input [1:0] pg, pg0;
output [1:0] pgo;
assign pgo[1] = pg[1] & pg0[1];
assign pgo[0] = pg[0] | (pg0[0] & pg[1]) ;
endmodule
////////////////////////////////////////////////////////////////////////////////////////////////////////////
module Carry_Inc(
input cin,
input [1:0] c7, c6, c5, c4, c3, c2, c1, c0,
output [7:0] r1c
);
// Carry Increment Stage
gray ic7(.pg(c7), .pg0(cin), .pgo(r1c[7]));
gray ic6(.pg(c6), .pg0(cin), .pgo(r1c[6]));
gray ic5(.pg(c5), .pg0(cin), .pgo(r1c[5]));
gray ic4(.pg(c4), .pg0(cin), .pgo(r1c[4]));
gray ic3(.pg(c3), .pg0(cin), .pgo(r1c[3]));
gray ic2(.pg(c2), .pg0(cin), .pgo(r1c[2]));
gray ic1(.pg(c1), .pg0(cin), .pgo(r1c[1]));
gray ic0(.pg(c0), .pg0(cin), .pgo(r1c[0]));
endmodule
////////////////////////////////////////////////////////////////////////////////////////////////////////////
module gray (pg, pg0, pgo);
input [1:0] pg;
input pg0;
output pgo;
assign pgo = (pg0 & pg[1]) | pg[0];
endmodule
////////////////////////////////////////////////////////////////////////////////////////////////////////////
module Mod2n_1_8b_wo_pg_KS(
input [7:0] p,
input [7:0] g,
input Os,
input [7:0] p1,
output [7:0] c
);
wire [1:0] r1c7, r1c6, r1c5, r1c4, r1c3, r1c2, r1c1, r1c0;
assign r1c7 = {p[7],g[7]};
assign r1c6 = {p[6],g[6]};
assign r1c5 = {p[5],g[5]};
assign r1c4 = {p[4],g[4]};
assign r1c3 = {p[3],g[3]};
assign r1c2 = {p[2],g[2]};
assign r1c1 = {p[1],g[1]};
assign r1c0 = {p[0],g[0]};
wire [1:0] r2c7, r2c6, r2c5, r2c4, r2c3, r2c2, r2c1;
APPENDIX A. VERILOG CODE OF THE PROPOSED ADDER 67
black ir1c7(.pg(r1c7), .pg0(r1c6), .pgo(r2c7));
black ir1c6(.pg(r1c6), .pg0(r1c5), .pgo(r2c6));
black ir1c5(.pg(r1c5), .pg0(r1c4), .pgo(r2c5));
black ir1c4(.pg(r1c4), .pg0(r1c3), .pgo(r2c4));
black ir1c3(.pg(r1c3), .pg0(r1c2), .pgo(r2c3));
black ir1c2(.pg(r1c2), .pg0(r1c1), .pgo(r2c2));
black ir1c1(.pg(r1c1), .pg0(r1c0), .pgo(r2c1));
wire [1:0] r3c7, r3c6, r3c5, r3c4, r3c3, r3c2;
black ir2c7(.pg(r2c7), .pg0(r2c5), .pgo(r3c7));
black ir2c6(.pg(r2c6), .pg0(r2c4), .pgo(r3c6));
black ir2c5(.pg(r2c5), .pg0(r2c3), .pgo(r3c5));
black ir2c4(.pg(r2c4), .pg0(r2c2), .pgo(r3c4));
black ir2c3(.pg(r2c3), .pg0(r2c1), .pgo(r3c3));
black ir2c2(.pg(r2c2), .pg0(r1c0), .pgo(r3c2));
wire [1:0] r4c7, r4c6, r4c5, r4c4;
black ir3c7(.pg(r3c7), .pg0(r3c3), .pgo(r4c7));
black ir3c6(.pg(r3c6), .pg0(r3c2), .pgo(r4c6));
black ir3c5(.pg(r3c5), .pg0(r2c1), .pgo(r4c5));
black ir3c4(.pg(r3c4), .pg0(r1c0), .pgo(r4c4));
wire r5c7, r5c6, r5c5, r5c4, r5c3, r5c2, r5c1, r5c0;
// End Around Carry Stage
assign c0 = r4c7[0] &Os ;
gray ir4c6(.pg(r4c6), .pg0(c0), .pgo(r5c6));
gray ir4c5(.pg(r4c5), .pg0(c0), .pgo(r5c5));
gray ir4c4(.pg(r4c4), .pg0(c0), .pgo(r5c4));
gray ir4c3(.pg(r3c3), .pg0(c0), .pgo(r5c3));
gray ir4c2(.pg(r3c2), .pg0(c0), .pgo(r5c2));
gray ir4c1(.pg(r2c1), .pg0(c0), .pgo(r5c1));
gray ir4c0(.pg(r1c0), .pg0(c0), .pgo(r5c0));
assign c = {r4c7[0],r5c6,r5c5,r5c4,r5c3,r5c2,r5c1,r5c0};
endmodule
Appendix B
HSPICE Simulation Files
B.1 Cells
Cell library for simulation.
* Cells
.subckt inv in out length=0.05u width=0.09u
m1 out in vdd vdd PMOS_VTL l=length w=’2.5*width’
m2 out in gnd gnd NMOS_VTL l=length w=width
.ends inv
.subckt nand2 in1 in2 out length=0.05u width=0.09u
m1 out in2 vdd vdd PMOS_VTL L=length W=’2.5*width’
m2 out in1 vdd vdd PMOS_VTL L=length W=’2.5*width’
m3 out in1 1 1 NMOS_VTL L=length W=’2*width’
m4 1 in2 gnd gnd NMOS_VTL L=length W=’2*width’
.ends nand2
.subckt nor2 in1 in2 out length=0.05u width=0.09u
m1 out in2 1 1 PMOS_VTL L=length W=’5*width’
m2 1 in1 vdd vdd PMOS_VTL L=length W=’5*width’
m3 out in1 gnd gnd NMOS_VTL L=length W=’1*width’
m4 out in2 gnd gnd NMOS_VTL L=length W=’1*width’
.ends nor2
.subckt tg in pctrl nctrl out length=0.05u width=0.09u
m0 in pctrl out vdd PMOS_VTL L=length W=’1*width’
m1 in nctrl out gnd NMOS_VTL L=length W=’1*width’
.ends tg
.subckt xor2 in1 in2 out length=0.05u width=0.09u
x01 in1 out1 inv L=length W=’1*width’
x02 in2 out2 inv L=length W=’1*width’
x03 in1 in2 out2 out tg L=length W=’1*width’
x04 out1 out2 in2 out tg L=length W=’1*width’
.ends xor2
.subckt xnor2 in1 in2 out length=0.05u width=0.09u
x01 in2 out2 inv L=length W=’2*width’
x02 in1 out2 in2 out tg L=length W=’1*width’
m1 out in1 out2 vdd PMOS_VTL L=length W=’2*2.5*1*width’
m2 out in1 in2 gnd NMOS_VTL L=length W=’2*width’
68
APPENDIX B. HSPICE SIMULATION FILES 69
.ends xnor2
.subckt AOI A B C F length=0.05u width=0.09u
m01 F A 2 2 PMOS_VTL L=length W=’5*width’
m02 F B 2 2 PMOS_VTL L=length W=’5*width’
m03 2 C vdd vdd PMOS_VTL L=length W=’5*width’
m04 F A 1 1 NMOS_VTL L=length W=’2*width’
m05 1 B gnd gnd NMOS_VTL L=length W=’2*width’
m06 F C gnd gnd NMOS_VTL L=length W=’1*width’
.ends AOI
.subckt mux2 in1 in2 sel out length=0.05u width=0.09u
x01 sel nsel inv L=length W=’1*width’
x02 in1 sel nsel out tg L=length W=’1*width’
x03 in2 nsel sel out tg L=length W=’1*width’
.ends mux2
.subckt OAI A B C F length=0.05u width=0.09u
m01 F A 2 2 PMOS_VTL L=length W=’5*width’
m02 2 B vdd vdd PMOS_VTL L=length W=’5*width’
m03 F C vdd vdd PMOS_VTL L=length W=’2.5*width’
m04 F A 1 1 NMOS_VTL L=length W=’2*width’
m05 F1 B 1 1 NMOS_VTL L=length W=’2*width’
m06 1 C gnd gnd NMOS_VTL L=length W=’2*width’
.ends OAI
B.2 Simulation Code
* source CRITICAL PATH
* 16x8 EAC KS adder
.include ’NMOS_VTL_.inc’
.include ’PMOS_VTL_.inc’
.include ’cells.cir’
.global vdd gnd
.connect gnd 0
.PARAM W=’0.05u’ L=0.05u N=8 M=4
.PARAM vdd = 1.0V
vdd vdd 0 vdd
V_V1 in0 0
+PULSE 0 1 0 150p 150p 0.5n 3n
X_UI1 in0 in1 inv length=L width=’2*W’
X_UI2 in1 in inv length=L width=’2*W’
.connect in sA
X_U1_1 sA n1 inv length=L width=’f1*W’
X_U1_2 gnd n2 inv length=L width=’f1*W’
X_U1_3 sA gnd n2 Os tg length=L width=’4*W’
X_U1_4 n1 n2 gnd Os tg length=L width=’4*W’
X_U2_1 Os n3 inv length=L width=’f2*W’
APPENDIX B. HSPICE SIMULATION FILES 70
X_U2_2 vdd Os n3 1 tg length=L width=’4*W’
X_U2_3 gnd n3 Os 1 tg length=L width=’4*W’
X_U3 1 vdd 2 nand2 length=L width=’f3*W’
X_U4 2 gnd vdd 3 OAI length=L width=’f4*W’
X_U4_1 2 gnd vdd 51 OAI length=L width=’f4*W’
X_U5 3 vdd gnd 4 AOI length=L width=’f5*W’
X_U5_1 3 vdd gnd 52 AOI length=L width=’f5*W’
X_U6 4 gnd vdd 5 OAI length=L width=’f6*W’
X_U6_1 4 gnd vdd 53 OAI length=L width=’f6*W’
X_U7 5 vdd gnd 6 AOI length=L width=’f7*W’
X_U8 6 gnd vdd 7 OAI length=L width=’f8*W’
X_U8_1 6 gnd vdd 54 OAI length=L width=’f8*W’
X_U9 7 vdd gnd 8 AOI length=L width=’f9*W’
X_U9_1 7 vdd gnd 55 AOI length=L width=’f9*W’
X_U10 8 gnd vdd 9 OAI length=L width=’f10*W’
X_U11 9 vdd 10 nand2 length=L width=’f11*W’
X_U11_1 9 vdd 50 nand2 length=L width=’f11*W’
X_U12 10 gnd vdd 11 OAI length=L width=’f12*W’
X_U12_1 10 gnd vdd 35 OAI length=L width=’f12*W’
X_U12_2 10 gnd vdd 36 OAI length=L width=’f12*W’
X_U12_3 10 gnd vdd 37 OAI length=L width=’f12*W’
X_U13_1 11 n4 inv length=L width=’f13*W’
X_U13_2 gnd 11 n4 12 tg length=L width=’f13*W’
X_U13_3 vdd n4 11 12 tg length=L width=’f13*W’
X_U14_1 12 n6 inv length=L width=’f14*W’
X_U14_4 vdd n5 inv length=L width=’f14*W’
X_U14_2 12 vdd n5 s75 tg length=L width=’f14*W’
X_U14_3 n6 n5 vdd s75 tg length=L width=’f14*W’
X_UO s75 out inv length=L width=’2*W’
.PARAM f1=2 f2=2 f3=2.5 f4=2 f5=1.6 f6=1.3 f7=2 f8=1.6 f9=1.3 f10=2 f11=2.5 f12=0.8 f13=2 f14=2
.tran 0.1p 100n
.option post=2 nomod LIST
.meas tran tplh_inr trig v(in) td=70n val=’vdd/2’ cross=1 targ v(s75) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf trig v(in) td=71n val=’vdd/2’ cross=1 targ v(s75) td=71n val=’vdd/2’ cross=1
************************************************************************************************
.meas tran tplh_inr_Os_in trig v(in) td=70n val=’vdd/2’ cross=1 targ v(Os) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_Os_in trig v(in) td=71n val=’vdd/2’ cross=1 targ v(Os) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_1_Os trig v(Os) td=70n val=’vdd/2’ cross=1 targ v(1) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_1_Os trig v(Os) td=71n val=’vdd/2’ cross=1 targ v(1) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_2_1 trig v(1) td=70n val=’vdd/2’ cross=1 targ v(2) td=70n val=’vdd/2’ cross=1
APPENDIX B. HSPICE SIMULATION FILES 71
.meas tran tplh_inf_2_1 trig v(1) td=71n val=’vdd/2’ cross=1 targ v(2) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_3_2 trig v(2) td=70n val=’vdd/2’ cross=1 targ v(3) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_3_2 trig v(2) td=71n val=’vdd/2’ cross=1 targ v(3) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_4_3 trig v(3) td=70n val=’vdd/2’ cross=1 targ v(4) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_4_3 trig v(3) td=71n val=’vdd/2’ cross=1 targ v(4) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_5_4 trig v(4) td=70n val=’vdd/2’ cross=1 targ v(5) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_5_4 trig v(4) td=71n val=’vdd/2’ cross=1 targ v(5) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_6_5 trig v(5) td=70n val=’vdd/2’ cross=1 targ v(6) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_6_5 trig v(5) td=71n val=’vdd/2’ cross=1 targ v(6) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_7_6 trig v(6) td=70n val=’vdd/2’ cross=1 targ v(7) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_7_6 trig v(6) td=71n val=’vdd/2’ cross=1 targ v(7) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_8_7 trig v(7) td=70n val=’vdd/2’ cross=1 targ v(8) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_8_7 trig v(7) td=71n val=’vdd/2’ cross=1 targ v(8) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_9_8 trig v(8) td=70n val=’vdd/2’ cross=1 targ v(9) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_9_8 trig v(8) td=71n val=’vdd/2’ cross=1 targ v(9) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_10_9 trig v(9) td=70n val=’vdd/2’ cross=1 targ v(10) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_10_9 trig v(9) td=71n val=’vdd/2’ cross=1 targ v(10) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_11_10 trig v(10) td=70n val=’vdd/2’ cross=1 targ v(11) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_11_10 trig v(10) td=71n val=’vdd/2’ cross=1 targ v(11) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_12_11 trig v(11) td=70n val=’vdd/2’ cross=1 targ v(12) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_12_11 trig v(11) td=71n val=’vdd/2’ cross=1 targ v(12) td=71n val=’vdd/2’ cross=1
.meas tran tplh_inr_s75_12 trig v(12) td=70n val=’vdd/2’ cross=1 targ v(s75) td=70n val=’vdd/2’ cross=1
.meas tran tplh_inf_s75_12 trig v(12) td=71n val=’vdd/2’ cross=1 targ v(s75) td=71n val=’vdd/2’ cross=1
*************************************************************************************************
.PRINT POWER
.MEASURE TRAN avg_power AVG POWER from 0ns to 100ns
******** alterations
.alter case 2:
.TEMP 100
.alter case 3:
.TEMP 25
.param vdd=1.2V
.alter case 4:
.TEMP 100
.param vdd=1.2V
.alter case 5:
.TEMP 25
.param vdd=0.8V
.alter case 6:
.TEMP 100
.param vdd=0.8V
.end
B.3 Condition of Transistors
Transistor conditions under 25◦C and 1.0V supply:
element name 1:m1 1:m2 2:m1 2:m2
drain 0:in1 0:in1 0:in 0:in
gate 0:in0 0:in0 0:in1 0:in1
source 0:vdd 0:0 0:vdd 0:0
APPENDIX B. HSPICE SIMULATION FILES 72
bulk 0:vdd 0:0 0:vdd 0:0
model 0:pmos_vtl 0:nmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 240.0000n 90.0000n 240.0000n 90.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 19.7699a 7.9079a 19.7699a 7.9079a
capbd 124.7500a 49.9000a 124.7500a 49.9000a
capbs 199.6000a 79.8400a 199.6000a 79.8400a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 3:m1 3:m2 4:m1 4:m2
drain 0:n1 0:n1 0:n2 0:n2
gate 0:in 0:in 0:0 0:0
source 0:vdd 0:0 0:vdd 0:0
bulk 0:vdd 0:0 0:vdd 0:0
model 0:pmos_vtl 0:nmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 240.0000n 90.0000n 240.0000n 90.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 19.7699a 7.9079a 19.7699a 7.9079a
capbd 124.7500a 49.9000a 124.7500a 49.9000a
capbs 199.6000a 79.8400a 199.6000a 79.8400a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
APPENDIX B. HSPICE SIMULATION FILES 73
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 5:m0 5:m1 6:m0 6:m1
drain 0:in 0:in 0:n1 0:n1
gate 0:0 0:n2 0:n2 0:0
source 0:os 0:os 0:os 0:os
bulk 0:vdd 0:0 0:vdd 0:0
model 0:pmos_vtl 0:nmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 190.0000n 190.0000n 190.0000n 190.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 15.8159a 15.8159a 15.8159a 15.8159a
capbd 99.8000a 99.8000a 99.8000a 99.8000a
capbs 159.6800a 159.6800a 159.6800a 159.6800a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 74
sbeff 0. 0. 0. 0.
element name 7:m1 7:m2 8:m0 8:m1
drain 0:n3 0:n3 0:vdd 0:vdd
gate 0:os 0:os 0:os 0:n3
source 0:vdd 0:0 0:1 0:1
bulk 0:vdd 0:0 0:vdd 0:0
model 0:pmos_vtl 0:nmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 240.0000n 90.0000n 190.0000n 190.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 19.7699a 7.9079a 15.8159a 15.8159a
capbd 124.7500a 49.9000a 99.8000a 99.8000a
capbs 199.6000a 79.8400a 159.6800a 159.6800a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 9:m0 9:m1 10:m1 10:m2
drain 0:0 0:0 0:2 0:2
gate 0:n3 0:os 0:vdd 0:1
source 0:1 0:1 0:vdd 0:vdd
bulk 0:vdd 0:0 0:vdd 0:vdd
model 0:pmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
w eff 190.0000n 190.0000n 302.5000n 302.5000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 15.8159a 15.8159a 24.7123a 24.7123a
capbd 99.8000a 99.8000a 155.9375a 155.9375a
capbs 159.6800a 159.6800a 249.5000a 249.5000a
temp 25.0000 25.0000 25.0000 25.0000
APPENDIX B. HSPICE SIMULATION FILES 75
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 10:m3 10:m4 11:m01 11:m02
drain 0:2 10:1 0:3 11:2
gate 0:1 0:vdd 0:2 0:0
source 10:1 0:0 11:2 0:vdd
bulk 10:1 0:0 11:2 0:vdd
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
w eff 240.0000n 240.0000n 490.0000n 490.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 19.7699a 19.7699a 39.5397a 39.5397a
capbd 124.7500a 124.7500a 249.5000a 249.5000a
capbs 199.6000a 199.6000a 399.2000a 399.2000a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 76
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 11:m03 11:m04 11:m05 11:m06
drain 0:3 0:3 11:f1 11:1
gate 0:vdd 0:2 0:0 0:vdd
source 0:vdd 11:1 11:1 0:0
bulk 0:vdd 11:1 11:1 0:0
model 0:pmos_vtl 0:nmos_vtl 0:nmos_vtl 0:nmos_vtl
w eff 240.0000n 190.0000n 190.0000n 190.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 19.7699a 15.8159a 15.8159a 15.8159a
capbd 124.7500a 99.8000a 99.8000a 99.8000a
capbs 199.6000a 159.6800a 159.6800a 159.6800a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 12:m01 12:m02 12:m03 12:m04
drain 0:51 12:2 0:51 0:51
gate 0:2 0:0 0:vdd 0:2
source 12:2 0:vdd 0:vdd 12:1
bulk 12:2 0:vdd 0:vdd 12:1
model 0:pmos_vtl 0:pmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 490.0000n 490.0000n 240.0000n 190.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 77
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 39.5397a 39.5397a 19.7699a 15.8159a
capbd 249.5000a 249.5000a 124.7500a 99.8000a
capbs 399.2000a 399.2000a 199.6000a 159.6800a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 12:m05 12:m06 13:m01 13:m02
drain 12:f1 12:1 0:4 0:4
gate 0:0 0:vdd 0:3 0:vdd
source 12:1 0:0 13:2 13:2
bulk 12:1 0:0 13:2 13:2
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
w eff 190.0000n 190.0000n 390.0000n 390.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 15.8159a 15.8159a 31.6318a 31.6318a
capbd 99.8000a 99.8000a 199.6000a 199.6000a
capbs 159.6800a 159.6800a 319.3600a 319.3600a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
APPENDIX B. HSPICE SIMULATION FILES 78
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 13:m03 13:m04 13:m05 13:m06
drain 13:2 0:4 13:1 0:4
gate 0:0 0:3 0:vdd 0:0
source 0:vdd 13:1 0:0 0:0
bulk 0:vdd 13:1 0:0 0:0
model 0:pmos_vtl 0:nmos_vtl 0:nmos_vtl 0:nmos_vtl
w eff 390.0000n 150.0000n 150.0000n 70.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 31.6318a 12.6527a 12.6527a 6.3264a
capbd 199.6000a 79.8400a 79.8400a 39.9200a
capbs 319.3600a 127.7440a 127.7440a 63.8720a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 14:m01 14:m02 14:m03 14:m04
APPENDIX B. HSPICE SIMULATION FILES 79
drain 0:52 0:52 14:2 0:52
gate 0:3 0:vdd 0:0 0:3
source 14:2 14:2 0:vdd 14:1
bulk 14:2 14:2 0:vdd 14:1
model 0:pmos_vtl 0:pmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 390.0000n 390.0000n 390.0000n 150.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 31.6318a 31.6318a 31.6318a 12.6527a
capbd 199.6000a 199.6000a 199.6000a 79.8400a
capbs 319.3600a 319.3600a 319.3600a 127.7440a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 14:m05 14:m06 15:m01 15:m02
drain 14:1 0:52 0:5 15:2
gate 0:vdd 0:0 0:4 0:0
source 0:0 0:0 15:2 0:vdd
bulk 0:0 0:0 15:2 0:vdd
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
w eff 150.0000n 70.0000n 315.0000n 315.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 12.6527a 6.3264a 25.7008a 25.7008a
capbd 79.8400a 39.9200a 162.1750a 162.1750a
capbs 127.7440a 63.8720a 259.4800a 259.4800a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 80
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 15:m03 15:m04 15:m05 15:m06
drain 0:5 0:5 15:f1 15:1
gate 0:vdd 0:4 0:0 0:vdd
source 0:vdd 15:1 15:1 0:0
bulk 0:vdd 15:1 15:1 0:0
model 0:pmos_vtl 0:nmos_vtl 0:nmos_vtl 0:nmos_vtl
w eff 152.5000n 120.0000n 120.0000n 120.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 12.8504a 10.2803a 10.2803a 10.2803a
capbd 81.0875a 64.8700a 64.8700a 64.8700a
capbs 129.7400a 103.7920a 103.7920a 103.7920a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 81
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 16:m01 16:m02 16:m03 16:m04
drain 0:53 16:2 0:53 0:53
gate 0:4 0:0 0:vdd 0:4
source 16:2 0:vdd 0:vdd 16:1
bulk 16:2 0:vdd 0:vdd 16:1
model 0:pmos_vtl 0:pmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 315.0000n 315.0000n 152.5000n 120.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 25.7008a 25.7008a 12.8504a 10.2803a
capbd 162.1750a 162.1750a 81.0875a 64.8700a
capbs 259.4800a 259.4800a 129.7400a 103.7920a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 16:m05 16:m06 17:m01 17:m02
drain 16:f1 16:1 0:6 0:6
gate 0:0 0:vdd 0:5 0:vdd
source 16:1 0:0 17:2 17:2
bulk 16:1 0:0 17:2 17:2
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
w eff 120.0000n 120.0000n 490.0000n 490.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 10.2803a 10.2803a 39.5397a 39.5397a
APPENDIX B. HSPICE SIMULATION FILES 82
capbd 64.8700a 64.8700a 249.5000a 249.5000a
capbs 103.7920a 103.7920a 399.2000a 399.2000a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 17:m03 17:m04 17:m05 17:m06
drain 17:2 0:6 17:1 0:6
gate 0:0 0:5 0:vdd 0:0
source 0:vdd 17:1 0:0 0:0
bulk 0:vdd 17:1 0:0 0:0
model 0:pmos_vtl 0:nmos_vtl 0:nmos_vtl 0:nmos_vtl
w eff 490.0000n 190.0000n 190.0000n 90.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 39.5397a 15.8159a 15.8159a 7.9079a
capbd 249.5000a 99.8000a 99.8000a 49.9000a
capbs 399.2000a 159.6800a 159.6800a 79.8400a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 83
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 18:m01 18:m02 18:m03 18:m04
drain 0:7 18:2 0:7 0:7
gate 0:6 0:0 0:vdd 0:6
source 18:2 0:vdd 0:vdd 18:1
bulk 18:2 0:vdd 0:vdd 18:1
model 0:pmos_vtl 0:pmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 390.0000n 390.0000n 190.0000n 150.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 31.6318a 31.6318a 15.8159a 12.6527a
capbd 199.6000a 199.6000a 99.8000a 79.8400a
capbs 319.3600a 319.3600a 159.6800a 127.7440a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 18:m05 18:m06 19:m01 19:m02
drain 18:f1 18:1 0:54 19:2
gate 0:0 0:vdd 0:6 0:0
source 18:1 0:0 19:2 0:vdd
bulk 18:1 0:0 19:2 0:vdd
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
APPENDIX B. HSPICE SIMULATION FILES 84
w eff 150.0000n 150.0000n 390.0000n 390.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 12.6527a 12.6527a 31.6318a 31.6318a
capbd 79.8400a 79.8400a 199.6000a 199.6000a
capbs 127.7440a 127.7440a 319.3600a 319.3600a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 19:m03 19:m04 19:m05 19:m06
drain 0:54 0:54 19:f1 19:1
gate 0:vdd 0:6 0:0 0:vdd
source 0:vdd 19:1 19:1 0:0
bulk 0:vdd 19:1 19:1 0:0
model 0:pmos_vtl 0:nmos_vtl 0:nmos_vtl 0:nmos_vtl
w eff 190.0000n 150.0000n 150.0000n 150.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 15.8159a 12.6527a 12.6527a 12.6527a
capbd 99.8000a 79.8400a 79.8400a 79.8400a
capbs 159.6800a 127.7440a 127.7440a 127.7440a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
APPENDIX B. HSPICE SIMULATION FILES 85
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 20:m01 20:m02 20:m03 20:m04
drain 0:8 0:8 20:2 0:8
gate 0:7 0:vdd 0:0 0:7
source 20:2 20:2 0:vdd 20:1
bulk 20:2 20:2 0:vdd 20:1
model 0:pmos_vtl 0:pmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 315.0000n 315.0000n 315.0000n 120.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 25.7008a 25.7008a 25.7008a 10.2803a
capbd 162.1750a 162.1750a 162.1750a 64.8700a
capbs 259.4800a 259.4800a 259.4800a 103.7920a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 86
element name 20:m05 20:m06 21:m01 21:m02
drain 20:1 0:8 0:55 0:55
gate 0:vdd 0:0 0:7 0:vdd
source 0:0 0:0 21:2 21:2
bulk 0:0 0:0 21:2 21:2
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
w eff 120.0000n 55.0000n 315.0000n 315.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 10.2803a 5.1402a 25.7008a 25.7008a
capbd 64.8700a 32.4350a 162.1750a 162.1750a
capbs 103.7920a 51.8960a 259.4800a 259.4800a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 21:m03 21:m04 21:m05 21:m06
drain 21:2 0:55 21:1 0:55
gate 0:0 0:7 0:vdd 0:0
source 0:vdd 21:1 0:0 0:0
bulk 0:vdd 21:1 0:0 0:0
model 0:pmos_vtl 0:nmos_vtl 0:nmos_vtl 0:nmos_vtl
w eff 315.0000n 120.0000n 120.0000n 55.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 25.7008a 10.2803a 10.2803a 5.1402a
capbd 162.1750a 64.8700a 64.8700a 32.4350a
capbs 259.4800a 103.7920a 103.7920a 51.8960a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
APPENDIX B. HSPICE SIMULATION FILES 87
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 22:m01 22:m02 22:m03 22:m04
drain 0:9 22:2 0:9 0:9
gate 0:8 0:0 0:vdd 0:8
source 22:2 0:vdd 0:vdd 22:1
bulk 22:2 0:vdd 0:vdd 22:1
model 0:pmos_vtl 0:pmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 490.0000n 490.0000n 240.0000n 190.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 39.5397a 39.5397a 19.7699a 15.8159a
capbd 249.5000a 249.5000a 124.7500a 99.8000a
capbs 399.2000a 399.2000a 199.6000a 159.6800a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 88
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 22:m05 22:m06 23:m1 23:m2
drain 22:f1 22:1 0:10 0:10
gate 0:0 0:vdd 0:vdd 0:9
source 22:1 0:0 0:vdd 0:vdd
bulk 22:1 0:0 0:vdd 0:vdd
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
w eff 190.0000n 190.0000n 302.5000n 302.5000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 15.8159a 15.8159a 24.7123a 24.7123a
capbd 99.8000a 99.8000a 155.9375a 155.9375a
capbs 159.6800a 159.6800a 249.5000a 249.5000a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 23:m3 23:m4 24:m1 24:m2
drain 0:10 23:1 0:50 0:50
gate 0:9 0:vdd 0:vdd 0:9
source 23:1 0:0 0:vdd 0:vdd
bulk 23:1 0:0 0:vdd 0:vdd
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
w eff 240.0000n 240.0000n 302.5000n 302.5000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
APPENDIX B. HSPICE SIMULATION FILES 89
cssat 19.7699a 19.7699a 24.7123a 24.7123a
capbd 124.7500a 124.7500a 155.9375a 155.9375a
capbs 199.6000a 199.6000a 249.5000a 249.5000a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 24:m3 24:m4 25:m01 25:m02
drain 0:50 24:1 0:11 25:2
gate 0:9 0:vdd 0:10 0:0
source 24:1 0:0 25:2 0:vdd
bulk 24:1 0:0 25:2 0:vdd
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
w eff 240.0000n 240.0000n 190.0000n 190.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 19.7699a 19.7699a 15.8159a 15.8159a
capbd 124.7500a 124.7500a 99.8000a 99.8000a
capbs 199.6000a 199.6000a 159.6800a 159.6800a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
APPENDIX B. HSPICE SIMULATION FILES 90
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 25:m03 25:m04 25:m05 25:m06
drain 0:11 0:11 25:f1 25:1
gate 0:vdd 0:10 0:0 0:vdd
source 0:vdd 25:1 25:1 0:0
bulk 0:vdd 25:1 25:1 0:0
model 0:pmos_vtl 0:nmos_vtl 0:nmos_vtl 0:nmos_vtl
w eff 90.0000n 70.0000n 70.0000n 70.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 7.9079a 6.3264a 6.3264a 6.3264a
capbd 49.9000a 39.9200a 39.9200a 39.9200a
capbs 79.8400a 63.8720a 63.8720a 63.8720a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 26:m01 26:m02 26:m03 26:m04
drain 0:35 26:2 0:35 0:35
gate 0:10 0:0 0:vdd 0:10
source 26:2 0:vdd 0:vdd 26:1
bulk 26:2 0:vdd 0:vdd 26:1
APPENDIX B. HSPICE SIMULATION FILES 91
model 0:pmos_vtl 0:pmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 190.0000n 190.0000n 90.0000n 70.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 15.8159a 15.8159a 7.9079a 6.3264a
capbd 99.8000a 99.8000a 49.9000a 39.9200a
capbs 159.6800a 159.6800a 79.8400a 63.8720a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 26:m05 26:m06 27:m01 27:m02
drain 26:f1 26:1 0:36 27:2
gate 0:0 0:vdd 0:10 0:0
source 26:1 0:0 27:2 0:vdd
bulk 26:1 0:0 27:2 0:vdd
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:pmos_vtl
w eff 70.0000n 70.0000n 190.0000n 190.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 6.3264a 6.3264a 15.8159a 15.8159a
capbd 39.9200a 39.9200a 99.8000a 99.8000a
capbs 63.8720a 63.8720a 159.6800a 159.6800a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
APPENDIX B. HSPICE SIMULATION FILES 92
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 27:m03 27:m04 27:m05 27:m06
drain 0:36 0:36 27:f1 27:1
gate 0:vdd 0:10 0:0 0:vdd
source 0:vdd 27:1 27:1 0:0
bulk 0:vdd 27:1 27:1 0:0
model 0:pmos_vtl 0:nmos_vtl 0:nmos_vtl 0:nmos_vtl
w eff 90.0000n 70.0000n 70.0000n 70.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 7.9079a 6.3264a 6.3264a 6.3264a
capbd 49.9000a 39.9200a 39.9200a 39.9200a
capbs 79.8400a 63.8720a 63.8720a 63.8720a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 93
element name 28:m01 28:m02 28:m03 28:m04
drain 0:37 28:2 0:37 0:37
gate 0:10 0:0 0:vdd 0:10
source 28:2 0:vdd 0:vdd 28:1
bulk 28:2 0:vdd 0:vdd 28:1
model 0:pmos_vtl 0:pmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 190.0000n 190.0000n 90.0000n 70.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 15.8159a 15.8159a 7.9079a 6.3264a
capbd 99.8000a 99.8000a 49.9000a 39.9200a
capbs 159.6800a 159.6800a 79.8400a 63.8720a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 28:m05 28:m06 29:m1 29:m2
drain 28:f1 28:1 0:n4 0:n4
gate 0:0 0:vdd 0:11 0:11
source 28:1 0:0 0:vdd 0:0
bulk 28:1 0:0 0:vdd 0:0
model 0:nmos_vtl 0:nmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 70.0000n 70.0000n 240.0000n 90.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 6.3264a 6.3264a 19.7699a 7.9079a
capbd 39.9200a 39.9200a 124.7500a 49.9000a
capbs 63.8720a 63.8720a 199.6000a 79.8400a
temp 25.0000 25.0000 25.0000 25.0000
aic
APPENDIX B. HSPICE SIMULATION FILES 94
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 30:m0 30:m1 31:m0 31:m1
drain 0:0 0:0 0:vdd 0:vdd
gate 0:11 0:n4 0:n4 0:11
source 0:12 0:12 0:12 0:12
bulk 0:vdd 0:0 0:vdd 0:0
model 0:pmos_vtl 0:nmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 90.0000n 90.0000n 90.0000n 90.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 7.9079a 7.9079a 7.9079a 7.9079a
capbd 49.9000a 49.9000a 49.9000a 49.9000a
capbs 79.8400a 79.8400a 79.8400a 79.8400a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 95
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 32:m1 32:m2 33:m1 33:m2
drain 0:n6 0:n6 0:n5 0:n5
gate 0:12 0:12 0:vdd 0:vdd
source 0:vdd 0:0 0:vdd 0:0
bulk 0:vdd 0:0 0:vdd 0:0
model 0:pmos_vtl 0:nmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 240.0000n 90.0000n 240.0000n 90.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 19.7699a 7.9079a 19.7699a 7.9079a
capbd 124.7500a 49.9000a 124.7500a 49.9000a
capbs 199.6000a 79.8400a 199.6000a 79.8400a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 34:m0 34:m1 35:m0 35:m1
drain 0:12 0:12 0:n6 0:n6
gate 0:vdd 0:n5 0:n5 0:vdd
source 0:s75 0:s75 0:s75 0:s75
bulk 0:vdd 0:0 0:vdd 0:0
model 0:pmos_vtl 0:nmos_vtl 0:pmos_vtl 0:nmos_vtl
w eff 90.0000n 90.0000n 90.0000n 90.0000n
l eff 22.5000n 22.5000n 22.5000n 22.5000n
rd eff 0. 0. 0. 0.
rs eff 0. 0. 0. 0.
APPENDIX B. HSPICE SIMULATION FILES 96
cdsat 10.0000f 10.0000f 10.0000f 10.0000f
cssat 7.9079a 7.9079a 7.9079a 7.9079a
capbd 49.9000a 49.9000a 49.9000a 49.9000a
capbs 79.8400a 79.8400a 79.8400a 79.8400a
temp 25.0000 25.0000 25.0000 25.0000
aic
nf 1.0000 1.0000 1.0000 1.0000
min 0. 0. 0. 0.
rbdb 15.0000 15.0000 15.0000 15.0000
rbsb 15.0000 15.0000 15.0000 15.0000
rbpb 5.0000 5.0000 5.0000 5.0000
rbps 15.0000 15.0000 15.0000 15.0000
rbpd 15.0000 15.0000 15.0000 15.0000
trnqsmod 0. 0. 0. 0.
acnqsmod 0. 0. 0. 0.
rbodymod 1.0000 1.0000 1.0000 1.0000
rgatemod 1.0000 1.0000 1.0000 1.0000
geomod 1.0000 1.0000 1.0000 1.0000
rgeomod 0. 0. 0. 0.
delvto 0. 0. 0. 0.
mulu0 1.0000 1.0000 1.0000 1.0000
delk1 0. 0. 0. 0.
delnfct 0. 0. 0. 0.
deltox 0. 0. 0. 0.
sa 0. 0. 0. 0.
sb 0. 0. 0. 0.
sd 0. 0. 0. 0.
saeff 0. 0. 0. 0.
sbeff 0. 0. 0. 0.
element name 36:m1 36:m2
drain 0:out 0:out
gate 0:s75 0:s75
source 0:vdd 0:0
bulk 0:vdd 0:0
model 0:pmos_vtl 0:nmos_vtl
w eff 240.0000n 90.0000n
l eff 22.5000n 22.5000n
rd eff 0. 0.
rs eff 0. 0.
cdsat 10.0000f 10.0000f
cssat 19.7699a 7.9079a
capbd 124.7500a 49.9000a
capbs 199.6000a 79.8400a
temp 25.0000 25.0000
aic
nf 1.0000 1.0000
min 0. 0.
rbdb 15.0000 15.0000
rbsb 15.0000 15.0000
rbpb 5.0000 5.0000
rbps 15.0000 15.0000
rbpd 15.0000 15.0000
trnqsmod 0. 0.
acnqsmod 0. 0.
rbodymod 1.0000 1.0000
rgatemod 1.0000 1.0000
APPENDIX B. HSPICE SIMULATION FILES 97
geomod 1.0000 1.0000
rgeomod 0. 0.
delvto 0. 0.
mulu0 1.0000 1.0000
delk1 0. 0.
delnfct 0. 0.
deltox 0. 0.
sa 0. 0.
sb 0. 0.
sd 0. 0.
saeff 0. 0.
sbeff 0. 0.