chapter 3 new sleepy- pass gate -...

56

CHAPTER 3

NEW SLEEPY- PASS GATE

3.1 INTRODUCTION

A circuit level design technique is presented in this chapter to

reduce the overall leakage power in conventional CMOS cells. The new

leakage po leepy- es on

static power dissipation in standby mode of operation using sleep transistors

in pass gate (transmission gate) structure. However, unlike the sleep transistor

technique, the Sleepy-pass gate technique retains the exact logic state; and,

unlike the LECTOR technique, the Sleepy-pass gate technique can utilize

high Vth transistors with 180X (or greater compared with CMOS NAND gate

at 100nm process) leakage savings. Therefore, far better than many prior

approaches, the Sleepy-pass gate technique can achieve ultra low leakage

power consumption while saving state.

First, the structure of the Sleepy-pass gate technique is explained

using a two input NAND gate. Then the detail of Sleepy-pass gate operation

is described in active mode and sleep (standby) mode.

3.2 SLEEPY-PASS GATE

3.2.1 Structure of Sleepy-pass Gate

The Figure 3.1 represents a pair of complementary MOS transistors

connected in parallel known as the CMOS pass gate (transmission gate)

57

configuration, which pass both 0 and 1 well. When the sleep signal on gate

terminal of PMOS is a Logic 0, then its complement sleep_bar (Logic 1) is

applied to gate terminal of NMOS, allowing both transistors to conduct and

pass the signal at IN to OUT. When the sleep signal on gate terminal of

PMOS is a Logic 1, complementary Logic 0 is applied to NMOS gate, turning

both transistors off and forcing a high-impedance condition on both IN and

OUT nodes. This high-impedance condition represents third "state" (high,

low, or high-Z). Thus, pass gate acts as an open circuit offering high

resistance. This design acts as a voltage controlled resistor connecting input

and output providing true bidirectional connectivity without degradation of

the input signal.

Figure 3.1 Pass gate (transmission gate) logic

The Figure 3.2 plots ON resistance (RON) of pass gate as input

voltage is swept from Gnd to VDD, assuming the output voltage closely

follows. The effective ON resistance is a parallel combination of two

resistances and is relatively constant across full range of input voltages.

However the OFF resistance is very high and it is in the range of several mega

ohms.

58

Figure 3.2 ON resistance of pass gate (Weste 2005)

The sleep transistor concept used for dynamic circuits in

(Kursun 2004) was adapted and modified to work for leakage reduction in

static CMOS complementary circuits. A combination of high Vth and standard

Vth sleep transistors are used in implementation (Lakshmikanthan 2006), to

provide a well balanced trade-off between high speed and leakage loss. The

proposed technique facilitates in the creation of an ultra low power standard

cell library, using sleep-circuitry embedded components.

Figure 3.3 illustrates the topology of a generic CMOS

complementary circuit with Sleepy-

inputs, input 1, input 2....input n, feeding the Pull-Up Network (PUN) as well

as the Pull-Down Network (PDN). The transistors in both the PUN and PDN

are standard Vth devices. The sleep circuitry consists of two transistors, one

PMOS device S1 and one NMOS device S2. Transistors S1 and S2 are high

59

Vth devices. The sleep transistors S1 and S2 are connected in parallel to form

pass gate (transmission gate) configuration between PUN and PDN as shown

in Figure 3.3

feed the gates of S1 and S2 respectively. The CMOS circuit output can be

drawn either between PUN and sleep circuit or between sleep circuit and

PDN.

Figure 3.3 Block diagram of generic Sleepy-pass gate CMOS circuit

3.2.2 Sleepy-pass Gate Operation

The working of the Sleepy-pass gate CMOS circuit is discussed in

this section. The sleep transistors S1 and S2 shown in Figure 3.3 are turned on

during active mode and turned OFF during sleep mode. During the normal

transistors S1 and S2 to turn ON and acts as a pass gate. The circuit behaves

as a normal CMOS circuit without any hindrance from the sleep circuit. This

can be seen from the DC characteristics obtained from HSPICE simulations.

60

Figure 3.4 shows a two input NAND gate with Sleepy-pass gate embedded.

Figure 3.5 shows the DC characteristics of the NAND gate with the proposed

method (the input A is fixed at 1 V and B is varied from 0 to 1 V).

Figure 3.4 Two input NAND gate with Sleepy-pass gate structure

Figure 3.5 DC characteristic of a two input NAND with Sleepy-pass gate

61

The ON resistance of the pass gate will be constant and lesser than

its OFF resistance, allowing conduction between PUN and PDN. Even though

the ON resistance of pass gate is not as high as its OFF state resistance, it

increases the resistance of VDD to ground path, controlling the flow of leakage

currents, resulting in leakage power reduction in active mode. In standby

S2 to turn OFF forcing a high-impedance condition between PUN and PDN

nodes.

Thus, the introduction of Sleepy- pass gate increases the resistance

of the path from VDD to ground during standby mode of operation resulting in

reduction of leakage current. The leakage reduction of the Sleepy-pass gate

structure occurs in two ways. First, leakage power is suppressed by high Vth

transistors, which are applied to the sleep transistors and the transistors

parallel to the sleep transistors. Second, increases the resistance of the path

from VDD to ground during standby mode, which also suppresses leakage

power consumption.

By combining these two effects, the Sleepy-pass gate structure

achieves ultra low leakage power consumption during sleep mode while

retaining exact logic state. Figure 3.6 shows the input-output curves of the

NAND gate with proposed method simulated for 100-nm technology at 1V

supply voltage. It can be observed from the curves that the proposed NAND

gate produces exact output logic levels. For any given process technology,

the standard Vth transistors are unit-sized devices (the smallest width to length

W/L ratio as defined by the technology). However, the high Vth transistors S1

and S2 needs to be sized appropriately for the Sleepy-pass gate embedded

CMOS cells to have a propagation delay comparable to that of the standard

cells.

62

Figure 3.6 Input - output waveforms of a two input NAND with proposed method

There is a nominal increase in both area and propagation delay of

the Sleepy-pass gate embedded circuit, when compared to the standard

CMOS circuits. This overhead of Sleepy-pass gate embedded cells is traded-

off against enormous power savings, when compared to the standard CMOS

cells. In addition to that, output logic state is not lost when the circuit enters

from mode to sleep mode and vice-versa. This seems attractive in comparison

with some of the existing ways to use far lower VDD values and additional

transistors to maintain logic state. Flautner (2002) also propose that some

reduced VDD values sufficient to maintain the logic state.

As an alternative option, the Sleepy-pass gate could be used

between the supply voltage VDD and PUN and/or PDN and ground during

active mode of operation. This will create virtual supply and/or ground rails.

During standby mode, all sleep transistors are off, thus the actual power and

63

ground path are broken and the circuit experiences lower voltage. A very high

resistance path is established between VDD and ground due to the parallel

combination of the off resistance of sleep transistors and the leakage current

flowing through the circuit reduces significantly and hence lowest power

dissipation.

3.3 EVALUATION OF SLEEPY-PASS GATE TECHNIQUE

The Sleepy-pass gate technique is evaluated by applying it for logic

circuits, benchmark circuits and comparing with other existing well known

techniques with HSPICE simulation/experimental setup shown in Figure 3.7.

Figure 3.7 Experimental setup

3.3.1 Simulation Setup

In order to compare the results of the Sleepy-pass gate method with the base case, an experiment was carried out with a set of combinational logic gates. The schematics are designed for all the mentioned techniques using

extracted from the schematics are modified with respect to the Berkeley

64

Predictive Technology Models. The modified netlists are simulated using Synopsys HSPICE for power and delay measurements.

3.3.2 Applying Sleepy-pass Gate for Logic Circuits

Various circuit applications of the Sleepy-pass gate technique are explored. The generic logic circuits including inverter, NAND2, NOR2, AND2, OR2, multiplexer and full adder gates are implemented using state empirical saving as well as state-destructive low leakage techniques for evaluation. Detailed experimental methodology is explained.

All circuits were simulated at a temperature of 25°C. Standard combinational CMOS library cells, such as NOR2, NAND2, OR2, AND2, XOR2, XNOR2 and MUX2x1, were implemented (Sahni 2006) using

and modified accordingly for respective process technologies. Transistor sizes in all these circuits were fixed, and taken as PMOS width WP = 2 NMOS width WN = 1L=100nm. A supply voltage (VDD) of 1V was used and transient analysis performed on all 7 cells listed above, using HSPICE. The output load for each of the 7 cells was a capacitor of 1pF.

3.3.2.1 Simulation results for logic gates

The total area of each standard cell in CMOS base case and proposed method is listed in Table 3.1. There is a slight increase in area for the proposed technique compared with base case due to additional transistors. Figure 3.8 and Figure 3.9 shows the layout for basic NAND and NOR gates for example. The propagation delay of each cell was measured for the purpose of comparing between base case and Sleepy-pass gate embedded cells. Next, the circuits were simulated at a temperature of 25oC and their leakage power measured. All possible input combinations were applied and leakage power loss measured in every case. Column 2 of Table 3.2 lists the average leakage power loss for each standard CMOS cell.

65

Figure 3.8 Two input CMOS NAND gate layout

Next, the Sleepy-pass gate circuitry was introduced for all

7 standard CMOS cells. The sleep transistors S1 and S2 are unit sized as

WS1 = WS2

100nm process technology. For each cell, transient analysis was performed in

the normal mode of operatio _

propagation delays were calculated and compared to the standard circuit

values as shown in Figure 3.10.

Figure 3.9 Two input NOR gate layout

66

Figure 3.10 Propagation delay comparison @100nm with VDD =1 V

The high Vth sleep transistor(s) were sized such that the propagation delay of the Sleepy-pass gate cell was comparable to that of the standard cell. The Figure 3.10 shows that there is an increased delay for proposed technique compared to base case due to additional transistors. Finally, the Sleepy-pass gate embedded cell was simulated in the sleep (standby) mode of operation

_3 of Table 3.2 lists the leakage power loss for all the Sleepy-pass gate embedded standard cells. Column 4 of Table 3.2 gives folds of leakage power savings on using the Sleepy-pass gate on combinational cells.

Table 3.1 Area measurements for combinational cells @ 100nm process

Area (µm2) CMOS Gate Base Case Sleepy-pass Gate

2 input NAND 36.290 40.188 2 input NOR 42.835 46.456 2 input XOR 213.430 217.460 2 input AND 50.246 64.843 2 input OR 61.278 78.129 2 input MUX 202.109 218.213 1-bit Full Adder 623.765 651.340

67

Table 3.2 Leakage power for combinational cells @ 100nm process

Average Leakage Power (W) with VDD = 1 Volt

CMOS Gate Base Case Sleepy-pass Gate Leakage Savings

2 input NAND 5.1700 E-08 2.8710 E-10 180 X

2 input NOR 5.4830 E-08 3.1959 E-10 170 X

2 input XOR 12.3677 E-08 8.2780 E-10 148 X

2 input AND 6.3950 E-08 4.8477 E-10 140 X

2 input OR 6.9464 E-08 5.2567 E-10 131 X

2 input MUX 6.3748 E-07 7.3758 E-09 85 X

1-bit Full Adder 8.9000 E-07 3.5600 E-09 249 X

3.3.2.2 Increase in dynamic power dissipation

The main emphasis till now has been on the standby (sleep) mode leakage power loss of the Sleepy-pass gate embedded cells. The dynamic power loss of these circuits has not been explored as yet. The dynamic power dissipation depends mainly on transient switching activity and frequency of operation, as well as on the square of the supply voltage.

In this section, the effect of the additional sleep circuitry components on dynamic power dissipation of standard cells is studied. The combinational standard library cells were used, and their dynamic power measured. Table 3.3 gives the dynamic power dissipation comparison between standard CMOS cells (base case) and Sleepy-pass gate embedded cells. Column 2 of Table 3.3 gives the dynamic power loss of standard CMOS cells. Column 3 of Table 3.3 lists the dynamic power dissipation of the Sleepy-pass gate embedded cells. Analysis of results in Table 3.3 shows that the dynamic power penalty (increase) of the Sleepy-pass gate embedded, when compared to that of standard cells is due to the additional transistors introduced and the consequent capacitive increase in the sleep-embedded circuits.

68

Table 3.3 Dynamic power for combinational cells @100nm process

Dynamic Power ( E -06 W) @ VDD=1V

Combinational Cells Base Case Sleepy-pass Gate

2 input NAND 7.7644 8.1902

2 input NOR 7.8190 8.2811

2 input XOR 12.0038 13.7445

2 input AND 9.0923 11.7016

2 input OR 8.9202 11.7391

2 input MUX 11.4912 12.0230

1-bit Full Adder 73.0032 90.1091

The literature detailing various methods to reduce dynamic power

has been analyzed and can be summarized as follows:

Clock and Signal Gating

This is the simplest and most straight forward method to reduce

transient switching activity of the highly active nodes in a circuit. Control

signal gating techniques, like those presented by Kapadia (1999), target

reduction in switching power.

Operand Isolation Techniques

The input sharing problem is typically the cause of unnecessary

switching activity in modules where there should be none. Consider a simple

Arithmetic and Logic Unit (ALU) designed for 4 operations (add, subtract,

multiply and shift), all sharing 2 input signals -

shifter units are simultaneously active along with the subtractor, thereby

69

wasting power. Operand isolation techniques, like using multiplexers or using

multiple registers to drive different modules, solve the input-sharing problem.

However, this increases the area and the delay, and adds other overheads.

Transistor Re-ordering Techniques

Hossain (1996) used a probability based transistor re-ordering

technique to reduce dynamic power dissipation in CMOS circuits.

Circuits Comprised of Independent Voltage Islands

Lackey (2002) presented a comprehensive background on methods

used to design voltage islands. They present various voltage island scenarios,

a system architecture and chip implementation methodology, which are used

to reduce active and static power consumption in SOC designs. The design

implications of voltage islands are also evaluated.

Carballo (2003) proposed a semi-custom voltage island approach to

build high-speed serial links. Their approach is a mixture of selective custom

design and the transparent use of multiple supplies to reduce power. The

digital circuitry on the chip runs at a low supply voltage, while the analog

circuitry runs at a higher voltage level. An on-chip regulator converts low to

high voltage, and vice-versa. MTCMOS transistors are used in the custom

design process.

Hillman (2005) focused on minimizing the operating voltage to

reduce dynamic power. The library of components created was characterized

for different voltages. Next, the whole SOC design was built with various

components from this library, using voltage level-shifting circuits and voltage

isolation cells.

70

Hung (2005) presented a voltage island partitioning and floor

planning algorithm for architecting SOC designs. Their work explores the

thermal impact of voltage islands. A hybrid optimization approach consisting

of a genetic algorithm based (GA-based) voltage island partitioning algorithm

and a simulated annealing based (SA-based) floor-planning algorithm, is

presented.

3.3.3 Applying Sleepy-pass Gate for Benchmark Circuits

The ISCAS'85 benchmark circuits are ten combinational networks

provided to authors at the 1985 International Symposium on Circuits and

Systems. They subsequently have been used by many researchers as a basis

for comparing results in several areas of digital design, including test

generation, timing analysis, and technology mapping.

Experiments were conducted on a variety of combinational multi

implemented using various deep

submicron process technologies. The HSPICE simulator, in conjunction with

the BPTM deep submicron technology, was used to simulate circuits and to

estimate leakage power dissipation.

All circuits (unless specified otherwise) were simulated at a

temperature of 25oC. The Berkeley Predictive Technology Models (BPTM)

contained process parameters and values only for standard Vth PMOS and

NMOS transistors. No models are available for high Vth transistors. Except

the Sleepy-pass gate transistors, the width for all other transistors are taken as

Wp = 3 µm and Wn=1µm for PMOS and NMOS respectively.

Experiments using some proprietary technology models obtained

directly from foundries showed an interesting trend in the threshold voltage

71

value of high Vth transistors. For a variety of deep-submicron technologies,

we observed that the threshold voltage value of a high Vth PMOS or a high Vth

NMOS transistor was 25%-35% more than that of a standard Vth transistor.

Hence, models for high Vth PMOS and NMOS transistors were incorporated

into BPTM with threshold voltage values 25% more than that of standard Vth

transistors. DC simulations were run using HSPICE to ensure that the

threshold values of these high Vth transistors were only 25% more than those

of standard Vth transistors.

Table 3.4 and Table 3.5 lists the supply and threshold voltage

values for various BPTM models for PMOS and NMOS transistors

respectively. The first columns in Tables 3.4 and 3.5 list the technology

feature size. The supply voltage used for each feature size is listed in Column

2 of both Table 3.4 and Table 3.5. Column 3 of Table 3.4 gives the threshold

voltage of a standard PMOS transistor, while Column 3 of Table 3.5 gives the

threshold voltage of a standard NMOS transistor. The threshold voltage of a

high Vth PMOS transistor is listed in Column 4 of Table 3.4 and the threshold

voltage of a high Vth NMOS transistor is listed in Column 4 of Table 3.5.

Table 3.4 PMOS threshold voltage for BPTM models

BPTM Process VDD PMOS Standard Vth PMOS High Vth

180nm 1.8V -0.42V -0.35V

130nm 1.3V -0.35V -0.32V

100nm 1.0V -0.30V -0.28V

70nm 0.85V -0.22V -0.18V

72

Table 3.5 NMOS threshold voltage for BPTM models

BPTM Process VDD NMOS Standard Vth NMOS High Vth 180nm 1.8V 0.41V 0.55V

130nm 1.3V 0.33V 0.38V

100nm 1.0V 0.26V 0.34V

70nm 0.85V 0.21V 0.39V

3.3.3.1 85 leakage values

Ten experimental

characteristics given in Table 3.6 with Sleepy-pass gate embedded. They were

sized appropriately for 4 different deep-submicron technologies - 180 nm,

130 nm, 100nm and 70 nm. The PMOS and NMOS are sized with width to

length ratio as W/L = 6 and W/L = 3 respectively except for the Sleepy-pass

gates.

Table 3.6 ISCAS '85 benchmark circuit characteristics

Circuit Name Circuit Function Total Gates Input

Lines Output Lines

C432 Priority Decoder 160(18 XOR) 36 7

C499 32-Bit Single-Error-Correcting Circuit 202(104 XOR) 41 32

C880 ALU and Control 383 60 26

C1355 32-Bit Single-Error-Correcting Circuit 546 41 32

C1908 16-bit error detector/ corrector 880 33 25

C2670 ALU and Control 1193 233 140 C3540 ALU and Control 1669 50 22 C5315 ALU and Selector 2307 178 123 C6288 16-bit Multiplier 2406 32 32 C7552 ALU and Control 3512 207 108

73

The circuit C7552, containing approximately 3512 gates, is the largest design among all the benchmarks chosen, while circuit C432 is the smallest circuit with 160 gates. The supply voltages for the respective technologies are given in Column 2 of Table 3.4 and Table 3.5 for PMOS and NMOS respectively. Simulations were carried out, using HSPICE in the standby mode of operation, and their leakage loss measured. Since exhaustive testing for many of the benchmarks was impossible, a representative sample of randomly generated input vector combinations was applied to each of the circuits, and leakage loss was measured in every case.

The average leakage power dissipation values are listed Table 3.7.

Table 3.6 give the leakage values of the various benchmarks implemented using the 180 nm BPTM. Similarly, Column 3 give leakage values of the benchmarks for the 130 nm BPTM; Column 4 give leakage values of the benchmarks for the 100nm BPTM; and Column give leakage values of the benchmarks for the 70 nm BPTM.

Table 3.7 R rk circuits with proposed method

Circuits Leakage Power ( nW)

180-nm 130-nm 100-nm 70-nm C432 8.385 3.816 1.034 0.374 C499 11.285 4.230 1.322 0.578 C880 22.127 8.503 1.673 1.376 C1355 35.634 13.214 3.428 1.245 C1908 52.460 19.409 5.160 2.736 C2670 73.519 28.016 5.916 3.178 C3540 102.835 38.285 10.271 4.119 C5315 147.277 56.291 13.157 7.432 C6288 157.268 60.172 14.049 7.163 C7552 213.497 78.642 21.580 11.039

74

3.3.4 Prior Low Leakage Techniques Considered for Comparison

Purposes

The Sleepy-pass gate technique is compared to a conventional

CMOS approach, which is the base case, and four other well-known previous

approaches, i.e., the forced stack, sleep, zigzag and LECTOR techniques. The

four bit adder circuit is chosen for implementation and comparison.

3.3.4.1 Four bit adder

By use of the one bit full adder shown in Figure 3.11, four bit adder

shown in Figure 3.12 is implemented. A full adder is an example of a typical

complex CMOS gate. In Figure 3.11, a and b are two inputs and c is a carry

input. Carry and Sum are outputs. The transistor sizing of the full adder is

noted in Figure 3.11.

Figure 3.11 One bit full adder

75

Figure 3.12 Inputs of four bit adder

3.3.4.2 Base case

CMOS technique has shown in Figure 3.13 and described in a classic

textbook by Weste (2005). Figure 3.13 shows a pull-up network and a pull-

down network using as few transistors as possible to implement the Boolean

logic function desired. The base case of a four bit adder is sized as explained

in Section 3.3.4.1.

Figure 3.13 Base case

76

3.3.4.3 Forced stack technique

Figure 3.14 shows the forced stack technique, which forces a stack

structure by breaking down an existing transistor into two half size transistors.

When the forced stack technique is applied, each existing transistor is

replaced with two half sized transistors as shown in Figure 3.14.

Figure 3.14 Forced stack

3.3.4.4 Sleep transistor technique

The sleep transistor technique shown in Figure 3.15 uses sleep

transistors between both VDD and the pull-up network as well as between Gnd

and the pull-down network. Generally, the width/length (W/L) ratio is sized

based on a trade-off between area, leakage reduction and delay. For

simplicity, the sleep transistor is sized to the size of the largest transistor in

the network (pull-up or pull-down) connected to the sleep transistor. The

PMOS and NMOS sleep transistors have W/L = 6 and W/L = 3, respectively.

77

Figure 3.15 Sleep technique

3.3.4.5 Zigzag

The zigzag technique in Figure 3.16 uses one sleep transistor in

each logic stage either in the pull-up or pull-down network according a

particular input pattern. In this thesis, an input vector is used that can achieve

the lowest possible leakage power consumption. Then, either assign a sleep

transistor to the pull- it is assigned to

the pull- . For Figure 3.16, it is assumed that the

minimum leakage inputs are asserted.

Therefore, a pull-down sleep transistor is assigned for the first stage

and a pull-up sleep transistor for the second stage. Similar to the sleep

transistor technique, the size of the sleep transistors are sized to that of the

largest transistor in the network (pull-up or pull-down) connected to the sleep

transistor. The PMOS and NMOS sleep transistors have W/L = 6 and

W/L = 3, respectively.

78

Figure 3.16 Zigzag

3.3.4.6 LECTOR

LECTOR is an adaptation of the technique of effective stacking of

transistors in order to reduce leakage power. Figure 3.17 shows the generic

block diagram of a LECTOR CMOS circuit. Two Leakage Control

Transistors (LTs), LT1 and LT2, are introduced between PUN and PDN.

These LTs act as self-controlled stacked transistors. The LECTOR structure is

shown in the Figure 3.17 where the LTs are unit sized with the ratio W/L = 6.

Figure 3.17 Generic block diagram of LECTOR

79

3.3.5 Experiments on Sleepy-pass Gate

3.3.5.1 Delay

The worst case propagation delay of each benchmark is measured.

Input vectors and input and output triggers are chosen to measure delay across

trigger input edge reaching 50% of the supply voltage value and the circuit

output edge reaching 50% of the supply voltage value. Input waveforms have

a 4ns period (i.e., a 250 MHz rate) and rise and fall times of 100ps.

3.3.5.2 Active power

Active power is measured by asserting semi-random input vectors

and calculating the average power dissipation during this time. Input vectors

are chosen so that a large number of possible input combinations are included

in the set. The average power dissipation reported by HSPICE is taken as the

estimate of active power consumption.

This active power includes dynamic power as well as static power

during the time measured. However, static power consumption is subtracted

to calculate pure dynamic power consumption. All sleep transistors are turned

on when active power is measured for the sleep, zigzag and Sleepy-pass gate

techniques.

For the four bit adder, input vectors covering every possible input

are asserted. The waveform in Figure 3.18 shows input vectors asserted for

each one bit adder, where the input vector changes in every 4ns. The same

signal timing is used while scaling technology from 0.18

signal timing is not customized )

80

because in this way the effect of technology scaling on a fixed clock can be

observed. However, it is known that reducing cycle time along with

technology feature size is possible and may reveal additional insights and

tradeoffs.

3.3.5.3 Static power

HSPICE is also used to measure static power consumption. Since

static power varies according to input state, either a full combination of input

vectors or subset of possible input combinations is considered. When static

power is measured, first an input vector is asserted and power consumption is

measured after signals become stable (e.g., after 30ns). Each measured static

power consumption over 30ns is averaged to derive static power consumption

of each circuit.

Figure 3.18 One bit adder input-output signals for dynamic power measurement

For the four bit adder, all possible input vectors of a full adder are

considered for leakage power measurement. The sleep transistors of the sleep,

81

zigzag and Sleepy-pass gate techniques are turned OFF during sleep mode in

which the leakage power dissipation is measured.

3.3.5.4 Area

particular design style (e.g., base case) is measured using layout. For a four

bit adder, an actual full layout of adder is directly measured and estimated

scaling the area of each benchmark layout for each particular design style

Around 10% area overhead is added in order to consider non-linear 2

m process, the area for 2 x (0.130/0.180) x 1.1 is estimated.

To estimate

technologies, extra area needed to wire gates is not taken into account but the

absence of a wiring penalty equally affects all techniques considered (i.e, base

case, sleep, forced stack, zigzag, LECTOR and Sleepy-pass gate). Figure 3.19

shows the layout of a full adder.

82

Figure 3.19 Layout of full adder

83

3.3.6 Comparative Results

First the Sleepy-pass gate is compared to the base case and

LECTOR techniques (self triggered sleep circuitry) in terms of leakage power

and delay using a two input NAND gate at temperature 25oC. Secondly with

well-known techniques, i.e., sleep, zigzag, and forced stack, in terms of active

power, leakage power, area and delay using four bit full adder circuit

explained in Section 3.3.4.1.

3.3.6.1 Results of two input NAND gate

Tables 3.8, 3.9 and 3.10 gives the leakage power, savings and delay

penalty comparison for the base case, LECTOR and Sleepy-pass gate

embedded NAND gates respectively. For a fair comparison, the supply

voltage was set to 1V for the 100nm. In Table 3.8 the values reported in rows

3 list the leakage power values for a base case NAND gate, using BPTM

100nm. In Table 3.8 row 4 list the leakage values for the LECTOR NAND

gate, using BPTM 100nm. Table 3.8 row 5 gives the leakage (in standby

mode) values for the Sleepy-pass gate embedded NAND gate, using BPTM

100nm. Column 2 to column 5 of Table 3.8 indicates the corresponding input

vectors applied to the gates. Analysis of the results in Table 3.8 shows that the

leakage power is dependent on input vector applied to the circuits.

(Abdollahi 2004).

Column 2 and column 3 of Table 3.9 presents the average leakage

power and leakage savings obtained for base case, LECTOR and Sleepy-pass

gate NAND gates. It shows that the Sleepy-pass gate technique has the least

leakage power dissipation and largest leakage savings of 180% when

compared to the conventional CMOS NAND gate. Column 2 and column 3 of

Table 3.10 gives the delay obtained by introducing the additional transistors

and the corresponding delay penalty respectively. It shows that the

84

conventional NAND gate has the least propagation delay value compared to

LECTOR and Sleepy-pass gate techniques. Also it is seen that the

Sleepy-pass gate technique has less leakage power and reduced delay penalty

than LECTOR at 100nm from Figure 3.20 and Figure 3.21.

Table 3.8 Leakage power comparison for two input NAND gate

100nm Process Technology with VDD = 1 Volt

Method Leakage Power(W) for Input Vectors

00 01 10 11

Base CMOS 4.7335E-09 4.1158E-08 4.2515E-08 1.1841E-07

Lector 6.6976E-09 4.6137E-09 4.1881E-09 3.3269E-09

Sleepy-pass gate 3.8344E-10 3.7341E-10 3.5013E-09 3.8653E-10

Table 3.9 Leakage power savings for two input NAND gate


Method Average Leakage (W) Avgerage Leakage Savings

Base CMOS 5.170E-08 -

Lector 4.7065E-09 10.98 X

Sleepy-pass gate 2.8710E-10 180 X

Table 3.10 Delay penalty for two input NAND gate


Method Delay (s) Delay Penalty%

Base CMOS 1.3E-10 -

Lector 1.8E-10 38.46

Sleepy-pass gate 1.5E-10 15.38

85

Figure 3.20 Average leakage power for two input NAND gate

Figure 3.21 Propagation delay for two input NAND gate

3.3.6.2 Results of four bit full adder

The impact of technology scaling is explored from the results of a

four bit full adder in terms of static power, propagation delay, dynamic power

86

and area as tabulated in Table 3.11 to Table 3.14. Table 3.11 gives the static

power dissipation for 180nm, 130nm, 100nm and 70nm process technologies.

Table 3.11 shows that the Sleepy-pass gate achieves large leakage reduction

over the base case and the other compared leakage reduction techniques.

From Figures 3.22, 3.23, 3.24 and 3.25 it is observed that static power

increases as technology feature size shrinks.

Table 3.12 gives the propagation delay of a four bit full adder

implemented in base case as well as other leakage reduction techniques

including Sleepy-pass gate at 180nm, 130nm, 100nm and 70nm process

technologies. From Figure 3.26 it is observed that propagation delay

decreases as technology feature size shrinks. Compared to the base case,

Sleepy-pass gate has increasing delay and smaller when compared with

LECTOR technique.

Table 3.11 Static power dissipation for various process technologies

Static Power (W) of Four Bit Full Adder

4-bit adder 180nm 130nm 100nm 70nm

Base case 9.39E-10 9.34E-09 9.57E-08 9.31E-07

Forced stack 9.47E-11 8.36E-10 8.29E-09 7.28E-08

Sleep 7.28E-10 5.20E-09 6.40E-08 6.27E-07

Zigzag 4.28E-10 1.82E-09 4.09E-08 4.89E-08

LECTOR 8.11E-11 9.31E-11 9.87E-09 1.62E-08


87

Table 3.12 Propogation delay for various process technologies

Propagation Delay (s) of Four Bit Full Adder

4-bit Adder 180nm 130nm 100nm 70nm

Base case 7.21E-10 4.45E-10 3.71E-10 3.42E-10


Sleep 1.12E-09 6.10E-10 5.35E-10 4.98E-10

Zigzag 1.12E-09 6.10E-10 5.35E-10 4.98E-10

LECTOR 1.39E-09 1.15E-09 7.85E-10 7.51E-10


Table 3.13 gives the dynamic power dissipation of a four bit full

adder at 180nm, 130nm, 100nm and 70nm process technologies. It is

observed from Figure 3.27 that dynamic power decreases as technology

feature size shrinks. Sleepy-pass gate has increased dynamic power

dissipation compared to base case due to the additional sleep transistors used

as sleep circuitry. Sleepy-pass gate has reduced dynamic power dissipation

compared to zigzag and LECTOR techniques.

From Table 3.14 and Figure 3.28, Sleepy-pass gate technique has

increased area when compared to base case and decreases as technology

feature size shrinks. Finally, compared to other techniques, the overhead of

increased delay and area are judged to be worth. Therefore, our Sleepy-pass

gate approach can be used where state preservation and ultra low leakage

power consumption is needed and is judged to be worth the area overhead.

88

Table 3.13 Dynamic power dissipation for various process technologies

Dynamic Power Dissipation (W) of Four Bit Full Adder


Base case 4.81E-04 1.20E-04 3.82E-05 1.86E-05


Sleep 5.53E-04 1.37E-04 4.25E-05 2.21E-05

Zigzag 6.54E-04 1.67E-04 5.18E-05 2.83E-05

LECTOR 5.74E-04 1.43E-04 4.42E-05 2.93E-05


Table 3.14 Area measured for various process technologies

Area (µm2)of Four Bit Full Adder


Base case 59.54 43.00 33.07 23.15

Forced stack 77.35 55.86 42.97 30.08

Sleep 74.15 53.55 41.18 28.8

Zigzag 69.89 50.47 38.83 27.18

LECTOR 75.13 54.26 41.73 29.2

Sleepy-pass gate 74.25 53.62 41.25 28.8

89

Figure 3.22 Static power for four bit full adder at 180nm


90



91

Figure 3.26 Propagation delay for four bit full adder at various process technologies

Figure 3.27 Dynamic power dissipation for four bit full adder at various process technologies

92

Figure 3.28 Area measured for four bit full adder at various process technologies

3.4 SUMMARY

In this chapter, the Sleepy-pass gate structure was introduced and its operation for leakage power reduction is explained. The Sleepy-pass gate technique can achieve smaller transistor delay and larger leakage power savings than the other existing leakage reduction techniques compared. Scaling down of the CMOS technology feature size and threshold voltage for achieving high performance has resulted in increase of leakage power dissipation.

This chapter presented an efficient methodology for reducing leakage power in CMOS VLSI design. Throughout logic design, the proposed method could be used to reduce the static power of CMOS circuits. Some of the implications in implementing this technique are as follows:

Minimal additional circuitry is used to modify the original logic design to force the circuit into a low leakage state during standby mode of operation which can be a major implication in implementing this technique.

93

This technique requires a controller / power management system to automatically generate sleep signals during standby mode and also to activate it when necessary.

There is a tradeoff between area, delay and power by utilizing this technique for power savings. DT-LECTOR is proposed in next chapter which could be used in applications that demand high speed and where sleep signal control circuitry is not available.

chapter 3 new sleepy- pass gate -...

Documents