[ieee 2009 10th international symposium on quality of electronic design (isqed) - san jose, ca, usa...

6
978-1-4244-2953-0/09/$25.00 ©2009 IEEE 62 10th Int’l Symposium on Quality Electronic Design Leakage Optimization Using Transistor-Level Dual Threshold Voltage Cell Library Chandra S. Nagarajan 1 , Lin Yuan 2 , Gang Qu 3 , Barbara G. Stamps 4 1 Cisco Systems Inc., San Jose, CA 2 Synopsys, Inc., Mountain View, CA 3 University of Maryland, College Park, MD 4 Atmel Corp., Columbia, MD 2 E-mail: [email protected] Abstract Recently, a transistor level dual-V th technique has been proposed, where transistors within the same cell are allowed to have different V th to form the so-call mixed V th (MVT) cell. However, it is impractical to build a full MVT cell library and include it in the standard dual V th design flow. To make this practical, current approach adds another design phase after technology mapping to replace high leakage cells with their low leakage MVT variants. We propose a method to seamlessly and effectively integrate transistor-level dual V th technology into existing low power design flow. This paper reports our successful experience in applying this method to optimize leakage under timing constraints in an industrial design environment. For demonstration purpose, we build an MVT library based on only 15 cells in a standard library that contains 590 cells. On 11 ISCAS benchmarks and three industrial designs, this MVT library optimizes 27% of the design. Yet it gives an average of 9% and up to 25% leakage saving over the state-of-art gate level dual V th design with a full size high V th library. Keywords low power, cell library, leakage, transistor-level dual-V th 1. Introduction Power and energy efficiency has become one of the most critical design constraints. For the increasingly popular battery operated wireless and portable devices, lower power consumption means extended battery lifetime; for high performance computers and servers, it means reduced cooling cost and increased system reliability. Chip power is mainly dissipated as dynamic power and leakage power. As the technology continues scaling down, leakage power has increased dramatically, taking up to 54% of the total chip power at 65nm technology node [8]. Therefore, many leakage reduction techniques have been proposed recently at all design levels. Unlike dynamic power that occurs only when the system is active, leakage power is dissipated in the circuit during both standby and sleep modes. Multi-threshold (V th ) CMOS (MTCMOS) techniques use sleep transistors to shut down power supply to the modules that are not active [7]. However, when MTCMOS transistors are turned on, a large amount of in-rush current will flow through the sleep transistors, causing unacceptable instant voltage drop and/or long wakeup delay [9]. Body biasing and variable threshold CMOS techniques dynamically adjust threshold voltage of a transistor by biasing the body terminals [6]. Transistor stacking [16] and input vector control [1] methods take advantage of the transistor stack effects in CMOS gates and reduce leakage when the circuit is at the sleep mode. For runtime leakage reduction, multiple V th assignment, particularly dual-V th for practical concerns, is one of the most effective methods due to the exponential dependency between leakage and V th . In this method, logic cells on timing critical paths are assigned low-V th values to ensure the performance, while cells on the non-critical paths are assigned high-V th to save leakage [3,5,12,13,14]. Such gate level dual-V th assignment is effective and has been successfully adopted by industrial low power design flows [9]. Now it has become a common practice for EDA tools to take two cell libraries, one standard low-V th library (LVT) and another slower but less leaky high-V th library (HVT). Note that in the above dual-V th assignment approach, all the transistors in the same cell have the same Vth. It is also possible to apply this principle at transistor level, where transistors within the same cell are allowed to have different V th [2,4,10,11,15]. Gate delay of a cell, similar to the circuit delay, is determined by the longest pin-to-pin delay. Therefore, transistors that are not on any critical path can be assigned high V th to create low leakage mixed V th (MVT) cell variants without increasing the gate delay. Similar to gate-level dual-V th technique, we need to create an MVT cell library and add it to the design flow. However, this is challenging due to the large number of MVT cell variants. For a cell with k transistors, we only need to characterize the cell with high V th to build HVT library. But there are up to 2 k MVT cell variants if we allow each transistor to be assigned either high V th or low V th . The latest efforts to make this approach practical is to add another design phase after technology mapping to replace high leakage LVT cells by their MVT variants [2,10]. However, there are several shortcomings: first, it loses the opportunity for leakage reduction during technology mapping; second, its effectiveness is heavily restricted by the algorithm that needs to be developed to insert MVT cell variants into the circuit. Such algorithms require frequent circuit timing analysis, which is a very expensive procedure. Finally, they only consider a small subset of MVT variants for each cell due to the high cell characterization cost. In this paper, we report our successful experience in overcoming these barriers when optimizing leakage under timing constraints. Our approach seamlessly and effectively integrates transistor level dual-V th technology into the

Upload: barbara-g

Post on 03-Feb-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2009 10th International Symposium on Quality of Electronic Design (ISQED) - San Jose, CA, USA (2009.03.16-2009.03.18)] 2009 10th International Symposium on Quality of Electronic

978-1-4244-2953-0/09/$25.00 ©2009 IEEE 62 10th Int’l Symposium on Quality Electronic Design

Leakage Optimization Using Transistor-Level Dual Threshold Voltage Cell Library

Chandra S. Nagarajan1, Lin Yuan2, Gang Qu3, Barbara G. Stamps4

1Cisco Systems Inc., San Jose, CA 2Synopsys, Inc., Mountain View, CA

3University of Maryland, College Park, MD 4Atmel Corp., Columbia, MD

2E-mail: [email protected]

Abstract Recently, a transistor level dual-Vth technique has been

proposed, where transistors within the same cell are allowed to have different Vth to form the so-call mixed Vth (MVT) cell. However, it is impractical to build a full MVT cell library and include it in the standard dual Vth design flow. To make this practical, current approach adds another design phase after technology mapping to replace high leakage cells with their low leakage MVT variants. We propose a method to seamlessly and effectively integrate transistor-level dual Vth technology into existing low power design flow.

This paper reports our successful experience in applying this method to optimize leakage under timing constraints in an industrial design environment. For demonstration purpose, we build an MVT library based on only 15 cells in a standard library that contains 590 cells. On 11 ISCAS benchmarks and three industrial designs, this MVT library optimizes 27% of the design. Yet it gives an average of 9% and up to 25% leakage saving over the state-of-art gate level dual Vth design with a full size high Vth library.

Keywords low power, cell library, leakage, transistor-level dual-Vth

1. Introduction Power and energy efficiency has become one of the most

critical design constraints. For the increasingly popular battery operated wireless and portable devices, lower power consumption means extended battery lifetime; for high performance computers and servers, it means reduced cooling cost and increased system reliability. Chip power is mainly dissipated as dynamic power and leakage power. As the technology continues scaling down, leakage power has increased dramatically, taking up to 54% of the total chip power at 65nm technology node [8]. Therefore, many leakage reduction techniques have been proposed recently at all design levels.

Unlike dynamic power that occurs only when the system is active, leakage power is dissipated in the circuit during both standby and sleep modes. Multi-threshold (Vth) CMOS (MTCMOS) techniques use sleep transistors to shut down power supply to the modules that are not active [7]. However, when MTCMOS transistors are turned on, a large amount of in-rush current will flow through the sleep transistors, causing unacceptable instant voltage drop and/or long wakeup delay [9]. Body biasing and variable threshold CMOS techniques dynamically adjust threshold voltage of a

transistor by biasing the body terminals [6]. Transistor stacking [16] and input vector control [1] methods take advantage of the transistor stack effects in CMOS gates and reduce leakage when the circuit is at the sleep mode.

For runtime leakage reduction, multiple Vth assignment, particularly dual-Vth for practical concerns, is one of the most effective methods due to the exponential dependency between leakage and Vth. In this method, logic cells on timing critical paths are assigned low-Vth values to ensure the performance, while cells on the non-critical paths are assigned high-Vth to save leakage [3,5,12,13,14]. Such gate level dual-Vth assignment is effective and has been successfully adopted by industrial low power design flows [9]. Now it has become a common practice for EDA tools to take two cell libraries, one standard low-Vth library (LVT) and another slower but less leaky high-Vth library (HVT).

Note that in the above dual-Vth assignment approach, all the transistors in the same cell have the same Vth. It is also possible to apply this principle at transistor level, where transistors within the same cell are allowed to have different Vth [2,4,10,11,15]. Gate delay of a cell, similar to the circuit delay, is determined by the longest pin-to-pin delay. Therefore, transistors that are not on any critical path can be assigned high Vth to create low leakage mixed Vth (MVT) cell variants without increasing the gate delay.

Similar to gate-level dual-Vth technique, we need to create an MVT cell library and add it to the design flow. However, this is challenging due to the large number of MVT cell variants. For a cell with k transistors, we only need to characterize the cell with high Vth to build HVT library. But there are up to 2k MVT cell variants if we allow each transistor to be assigned either high Vth or low Vth. The latest efforts to make this approach practical is to add another design phase after technology mapping to replace high leakage LVT cells by their MVT variants [2,10]. However, there are several shortcomings: first, it loses the opportunity for leakage reduction during technology mapping; second, its effectiveness is heavily restricted by the algorithm that needs to be developed to insert MVT cell variants into the circuit. Such algorithms require frequent circuit timing analysis, which is a very expensive procedure. Finally, they only consider a small subset of MVT variants for each cell due to the high cell characterization cost.

In this paper, we report our successful experience in overcoming these barriers when optimizing leakage under timing constraints. Our approach seamlessly and effectively integrates transistor level dual-Vth technology into the

Page 2: [IEEE 2009 10th International Symposium on Quality of Electronic Design (ISQED) - San Jose, CA, USA (2009.03.16-2009.03.18)] 2009 10th International Symposium on Quality of Electronic

Yuan, Leakage Optimization Using Transistor-Level Dual…

existing low power design flow. Comparing to the existing approaches, our method has the following advantages:

We add MVT cell variants to cell library and use them during the technology mapping. So leakage optimization with transistor level dual Vth will be performed by the proven technology mapping tools instead of ad hoc new algorithms. This also makes building MVT library the only task we need to do to enjoy the leakage saving by transistor level dual Vth.

We can allow MVT cell variants to have up to 20% timing increase on critical arcs with respect to the standard LVT cells. This generates more leakage efficient cell variants. Again, we leave it to the proven EDA tools to map the cells to meet the timing requirements and optimize leakage simultaneously1.

The only reported practical method [2] needs to characterize up to 22n MVT variants for each n-input cell. The SPICE characterization is costly and increases design time. We propose a divide-and-conquer method to effectively choose cell variants for MVT library.

We use an industrial design environment where the library consists of 590 LVT and 590 HVT cells. For demonstrative purpose, we select the most frequently used 15 cells and build an MVT library of 40 cell variants after cell characterization using HSPICE with 130nm model. We then use the expanded library and Synopsys Design Compiler to implement eleven standard ISCAS benchmarks and three industrial designs. Results show that we are able to achieve an average 9% and up to 25% leakage saving over the state-of-art dual Vth design. This is significant considering that the latter is on a full 590-cell HVT library and our added MVT library is based on only 15 cells.

2. Related Work The key challenge of dual-Vth design is how to assign Vth

in the circuit such that the total leakage is minimized while the circuit meets its timing requirement. In this section, we focus our literature review on gate level and transistor level multiple Vth assignment algorithms.

Wei et al. proposed one of the first gate-level dual-Vth assignment algorithms [14]. They start with a netlist of all low Vth cells and traverse the netlist from primary inputs to primary outputs in a breadth-first order, assigning high Vth to cells with the largest timing slack. This algorithm needs to perform timing analysis each time a cell is assigned high Vth, and therefore is not computationally efficient. Wang et al. proposed a more efficient algorithm in [13] by formulating the dual Vth assignment problem as solving a Max-Cut in a weighted directed acyclic graph. This allows them to assign high Vth to multiple feasible gates at a time and reduces the run time significantly. Srivastava et al. defined the concept of leakage sensibility for each cell and proposed an algorithm to simultaneously perform gate sizing, supply voltage scaling and dual Vth assignment [12]. Khandelwal et al. proposed a

1 As a comparison, [2] requires the cell variants to have delay penalty on critical timing arcs under only 1% for their post-technology mapping dual Vth assignment algorithm to be effective and meet design’s timing requirement.

linear programming approach to solve the Vth selection and assignment problem in the continuous domain [3]. More recently, a dual-Vth assignment technique for both combinational components and sequential components (flip-flops) is proposed by Kim et al. in [5].

Due to its excellent results in leakage saving with high degree of automation, gate level dual-Vth technique has been adopted by EDA industry as a standard methodology in the recommended low power design flow [17]. The designer only need to set the maximum leakage power constraints and the synthesis tools can automatically select gates from a HVT gate library and a LVT gate library to optimize leakage power along non-critical timing paths.

In addition to gate-level dual Vth assignment, Wei et al. proposed several algorithms to assign different Vth to transistors within the same gate [15]. Without characterizing the power and delay for cell variants, they used simplified models. Such models are sufficient to show the concept of transistor-level dual Vth, but will not be acceptable for design in real life. To integrate their approach to low power design flow, each cell variant needs to be characterized and because they considered all the possible cell variants, this becomes impractical given the already large size of the today’s cell library. Sill et al. [10] proposed to add a phase after the standard dual-Vth design to insert MVT cell variants into the circuit. It requires timing analysis before each MVT cell can be inserted.

The best reported efforts to make MVT design practical is by Gupta et al. [2]. Their method consists of three steps. First, cell variants are created in order to minimize leakage without violating the critical timing arc by more than 1%. To limit the number of cell variants, they only consider a subset of all the possible cell variants for each cell. Second, each cell variant is characterized (for delay and power) to generate a library. For practical concerns, only the 25 most frequently used cells are considered. Finally, similar to [10], a greedy heuristic algorithm is developed to replace the standard cells by their low leakage cell variants. Static timing analysis is performed frequently to ensure no timing violation.

Finally, there are approaches that consider transistor level dual Vth assignment and sizing simultaneously for leakage reduction [4,11], which can also benefit from our approach.

3. Design Using Mixed Vth Cell library In this section, we first use an example to illustrate the

concept of transistor level mixed Vth cell variants. We then propose a set of requirements to facilitate the integration of MVT cell library into commercial EDA tools before we elaborate how to build MVT library and integrate it into the low power design flow. 3.1 Low-leakage Mixed Vth Cell Variants

In a CMOS gate, when an input changes its value, the time it takes for the output to be stable is called a timing arc. For an n-input logic gate, there are 2n timing arcs capturing the timings between the rise/fall of each input and the resulting output transition. Figure 1 depicts a two-input NAND gate implemented by four transistors. Table 1 gives the four timing arcs from input pins a1 and a2 to output pin F

Page 3: [IEEE 2009 10th International Symposium on Quality of Electronic Design (ISQED) - San Jose, CA, USA (2009.03.16-2009.03.18)] 2009 10th International Symposium on Quality of Electronic

Yuan, Leakage Optimization Using Transistor-Level Dual…

when all the four transistors are with low Vth. For example, when input a1 changes from 0 to 1 and a2 remains as 1, output F will change from 1 to 0 and it takes 78.41ps for F to be stable (see the first row in the table).

Table 1: Timing arcs in a standard cell nd02d1 with 1x

drive strength from an industrial 130nm cell library. a1 a2 F LVT Delay MVT Delay

01 1 10 78.41 ps 94.40 ps 1 01 10 98.14 ps 112.18 ps

10 1 01 108.95 ps 109.40 ps 1 10 01 119.44 ps 119.44 ps

In static timing analysis, the propagation delay of a gate is

determined by the longest (or critical) timing arc in the gate. For the above standard low Vth two-input NAND gate, its propagation delay is 119.44ps defined by the timing arc between the fall of a2 and the rise of F. This is about 52% longer than the timing arc, 78.41 ps, between the rise of a1 and the fall of F. Using MVT design method [2,7,13], we can assign the nMOS transistor T1 a high Vth = 0.51V instead of the low Vth = 0.40V (see in Figure 1). The last column in Table 1 shows the timing arcs in the new MVT gate. We see that the 119.44ps gate delay is preserved. Meanwhile, we observe that the gate leakage current is reduced from 1.60nA to 0.86nA.

An MVT cell variant is a cell whose transistors are assigned different Vth values. Due to the different timing arcs in a cell, transistors that are not on the critical timing arcs can be assigned high Vth without increasing the gate delay. Comparing to a standard cell where all the transistors have low Vth, an MVT cell variant is less leaky and hence including an MVT library in the design flow can lead to reduced leakage. 3.2 Requirements for building the MVT cell library

There is a great deal of similarity between gate level dual Vth design and transistor level dual Vth design. However, the difficulty of integrating them into the current low power design flow is quite different. At gate level, we only need to build two libraries: a LVT library that contains cells where all the transistors are built with low Vth; and a HVT library that contains all the cells built with high Vth. Dual Vth assignment algorithms are implemented in logic synthesis tools to map gates to either LVT cells or HVT cells under timing

constraint. HVT cells are preferred as they have lower leakage. Comparing to the standard LVT library, this only doubles the library size.

At transistor level, it becomes infeasible to build an MVT library that contains all the cell variants. Because each transistor can be assigned either high Vth or low Vth, for a cell with k transistors, there will be 2k cell variants. To handle this problem, early research [10,15] uses formulas to model the delay and power of each cell variant. These models are not accurate for real industrial designs. Furthermore, the size of the MVT library increases the complexity of dual Vth assignment algorithms.

The authors in [2] take a different approach to make transistor level dual Vth design practical. They characterize cell variants for only the most popular 25 cells, instead of all the cells. They use a heuristic to simultaneously assign a group of transistors, not individual transistor, high Vth to limit the possible variants. They allow a 1% delay penalty on critical timing arc when creating cell variants. Most importantly, they do technology mapping with standard LVT library before using a greedy heuristic to select and replace LVT cells by the low leakage MVT cell variants.

This approach makes transistor level dual Vth design practical by means of 1) limiting the number of MVT cell variants and 2) separating the MVT variants from the logic synthesis tools. However, these two means prevent us from reaching MVT cell’s full potential in leakage saving. First, without knowing the MVT cell variants, logic synthesis tools cannot find the technology mapping solution that minimizes leakage. Second, the heuristic that inserts MVT cell variants in the circuit further reduces the effectiveness of transistor level dual Vth design. It also requires the time consuming circuit static timing analysis. Third, very limited number of MVT cell variants can be created with the 1% delay penalty. In fact, when we create MVT library following the guidelines in [2] and include them into an industrial HVT and LVT libraries, Synopsys Design Compiler does not pick up any of these MVT cell variants in all the designs.

Our goal is to find an effective way to build an MVT cell library that can be seamlessly integrated into the current low power design flow with LVT and HVT libraries. To achieve this, it is necessary for the MVT cell library to possess the following features: Transparency to logic synthesis tools. Ideally, we want

to provide LVT, HVT, and MVT libraries to the logic synthesis tools and let the tool do the leakage minimization during technology mapping.

Cell selection and library size control. A full MVT library is not only impractical to build, but will also increase the complexity of synthesis tools. However, we want to consider as many cells as possible and choose the most leakage efficient variants.

Allowing high delay penalty on critical timing arcs. This makes it possible to create more MVT cell variants. We leave it to the proven logic synthesis tools to select the proper cells to meet the timing constraint and minimize leakage.

T1: high Vth

T2: low Vth a2

a1

a2 a1

Vdd

F

T3: low Vth T4: low Vth

Figure 1 Schematic view of a mixed Vth NAND2 cell.

Page 4: [IEEE 2009 10th International Symposium on Quality of Electronic Design (ISQED) - San Jose, CA, USA (2009.03.16-2009.03.18)] 2009 10th International Symposium on Quality of Electronic

Yuan, Leakage Optimization Using Transistor-Level Dual…

No new Vth value for MVT cell variants. Introducing new Vth values will give more leakage saving. But it requires extra masking and reduces yield, both increasing the design cost.

3.3 Building MVT Cell Library Given a standard LVT and HVT library, we aim to build

an MVT cell library that can be readily used by the logic synthesis tools to optimize leakage under timing constraints.

1) Vth, Delay Penalty, and Cell Selection As we have showed above, from manufacturing point of

view, we consider dual Vth and use the Vth values in the given LVT and HVT libraries to create MVT cell variants.

In gate level dual Vth design method, the high Vth value is normally 0.1V~0.2V higher than the standard low Vth value [18]. This results in a slower but much less leaky HVT library. For example, in the industrial dual Vth library we use for our experimentation, the HVT cells are on average 33% slower than the standard LVT cells. When we use this same high Vth value at transistor level, very few cell variants can be created without violating the gate delay. Therefore, it is common to allow certain degree of delay penalty in the cell. Post-technology mapping methods, such as [2] and [10], cannot tolerate very high delay penalty (e.g. 1% is used in [2]). This severely limits the MVT cell variants creation and leakage reduction. Since our approach considers the MVT cell variants during technology mapping phase where the synthesis tools will make the cell selection to meet the timing requirement, we are able to tolerate much higher delay penalty. In our experimentation, we allow up to 20% delay penalty.

We give an efficient algorithm which allows us to consider all the standard library cells for MVT variants creation. It also allows us to consider multiple variants for each cell, particularly when there is no variant dominates others in both leakage and delay. The algorithm is elaborated in the next section. However, due to the increasing cell library size, we recommend to adopt the same philosophy as in [2] to create variants only for the most frequently used LVT cells.

2) MVT Cell Variants Creation After we select the cells, the Vth values, and the delay

penalty, we need to create and characterize MVT cell. [15] assigns Vth to individual transistors in a cell separately and hence may result in up to 2k variants for a cell consists of k transistors. [2] assigns Vth based on the timing arcs of the cell which produces 22n variant candidates for an n-input cell. Both will not be practical for large libraries if all the cells need to be considered because of the time consuming SPICE cell characterization.

We propose a method based on cell decomposition. It estimates good variant candidates and selects small number of better ones for SPICE characterization. It consists of the following three steps:

Step 1. Variants creation for base cells. Simple cells such as 2-input NAND and INVERTER are characterized. Due to their small size, we can afford to characterize all the possible variant candidates without any timing constraint. The ones

that are dominated by others in both delay and leakage will be eliminated. We can also use a maximal number of variants a cell can have (we use 5 in our experiment) for practical concerns.

Step 2. Variants creation for decomposable cells. For complex cells that can be decomposed into base cells, we estimate their variant candidates from the {delay, leakage} pair of the base cell’s variant. Delay penalty and maximal number of variants for a cell can both be enforced here to eliminate certain options. Characterization is performed for the selected variants and simple cells, such as 2-input AND in Figure 2, will be included as base cells.

Step 3. Variants creation for other cells. For cells that cannot be decomposed into base cells. We use the approach similar to [2] to generate MVT cell variants based on the timing arcs. In addition, we enforce some heuristic rules. For example, transistors in a stack are always assigned the same Vth value because it is difficult to achieve different channel doping for transistors in a stack [15].

As an example, we show how we create variants for cell

and2d1 in step 2. Figure 2 shows the decomposition of and2d1 into a nand2d1 cell and an invd1 cell. We assume that nand2d1 has three variants with{delay, leakage} pair of {6,1}, {5,2} and {4,3}; and invd1 has two variants with {3,1} and {1,2}. Out of the six different combinations, two of them, {{5,2},{3,1}} and {{4,3},{3,1}}, are dominated by {{6,1},{1,2}}. So we only need to consider four variants. If we add a delay constraint of 8, {{6,1},{3,1}} will also be eliminated. According to [15] and [2], 26 = 64 and 22x2 = 16 variants need to be considered, respectively. 3.4 Dual-Vth Design Flow Using Mixed Vth Library

Dual-Vth implementation of an RTL design involves mapping the design to HVT and LVT libraries without violating timing and power constraints. Based on the engineering practices for timing and power saving [17], we use the dual Vth design flow with MVT library as follows:

First, the RTL description of the design is read in and the library is set to the slow but most leakage efficient HVT library. If this leakage-driven logic synthesis procedure meets the timing requirement, we find the most leakage optimized design. However, most likely timing will not be met and we then set the target library to be the combination of HVT and MVT libraries. An incremental synthesis is performed which

a2

a2

a1

a1

Vdd

F

T3

T2

T1

T4 T5

T6

Figure 2 Schematic view of and2d1 cell in the library

nand2d1

invd1

Page 5: [IEEE 2009 10th International Symposium on Quality of Electronic Design (ISQED) - San Jose, CA, USA (2009.03.16-2009.03.18)] 2009 10th International Symposium on Quality of Electronic

Yuan, Leakage Optimization Using Transistor-Level Dual…

only re-maps the gates that have negative timing slack to a library cell with smaller delay. If the timing is still not met, we add the fastest LVT library to the target library. The incremental synthesis will find a design that satisfies timing should such design and timing constraints are suitable for the LVT library.

4. Experimental Results The goal of this demonstrative experiment is to show (1)

the designed MVT library can be integrated into the above dual Vth design flow with existing EDA tools; and (2) what is the potential of leakage reduction by the MVT library. We have showed that existing approaches [2,10] fail for (1). We are unable to compare our results with them due to different parameter settings (such as Vth values, technology, etc.). 4.1 Experimental Setup

Benchmarks. We use 11 ISCAS benchmarks and three real industry designs from an EDA company [18]. The size of these designs range from 1269 cells to 70,966 cells. The delay for each circuit is the circuit delay when only the LVT library is used. This delay is used as the timing constraint for designs using HVT library and MVT cell variants. Therefore, the designs we consider here all have tight timing constraints.

MVT library. The company’s original LVT and HVT libraries each contain 590 cells implemented using 130nm technology. The Vth values for LVT and HVT libraries are 0.40V and 0.51V for NMOS and -0.41V and -0.52V for PMOS, respectively. We consider the 15 most frequently used cells from the company’s synthesis reports2. The candidate cell variants are characterized by Synopsys HSPICE and a customized tool [18]. More specifically, we run HSPICE simulation to characterize the timing on each possible pin-to-pin transition. Leakage of each cell variant is calculated as the average of leakage current, which is measured by company [18]’s tool, for all different cell input combinations. 40 cell variants are created and added to the MVT library. The library is in .db format that can be readily used in Synopsys Design Compiler.

Design tool. We integrate the design method with MVT library in Synopsys Design Compiler [17]. We follow Synopsys low power design flow in logic synthesis with auto wire load mode turned ON and leave the Design Compiler to estimate the wire loads. We use the command ‘compile -incr’ in the incremental synthesis stage to optimize power and meet timing. Power Compiler is used to obtain the total leakage power for each design.

Experimentation. For each of the above 14 benchmark designs, we synthesize three times. First, we map the circuit to LVT library only to obtain the reference delay and leakage (columns 3 and 4 in Table 2). We then repeat the design using the {LVT, HVT} and {LVT, HVT, MVT} libraries, respectively. In both cases, timing for each design (the reference delay in column 3) is met. The run time of using

2 These cells are selected based on the synthesis reports from company [18]. They are: inv0d1, inv0d2, inv0d4, nd02d1, nd02d2, nd02d4, buffd1, buffd2, an02d1, an02d2, an02d4, or02d1, or02d2, or02d4, nr0d1.

{LVT, HVT, MVT} library is similar to that of using {LVT, HVT} library.

Table 2 Leakage reduction. Leak: leakage of design using

LVT library only. : leakage reduction of design using {LVT, HVT}; : leakage reduction of design using {LVT, HVT, MVT}; : improvement of adding MVT to {LVT,HVT} design.

4.2 Results and Discussion

Leakage reduction. Columns and in Table 2 give the leakage saving over designs with LVT library only when HVT and MVT libraries are included in the design, respectively. The last column shows the improvement in leakage saving by adding MVT library into the dual-Vth design flow. We see that on average, using MVT library gives an additional 9% leakage saving. Design3 has the smallest improvement (less than 2%). This is because that design has many cells with positive timing slack and many of them are implemented by HVT cells by the Design Compiler, leaving little room for MVT cells.

This 9% saving is significant because it is achieved by adding a 40-cell MVT library to the standard LVT and HVT libraries that have 1180 cells. Further leakage reduction is expected if we increase the size the MVT library. More discussion on this can be found below when we analyze the usage of MVT cell variants.

MVT cell usage. As we use a very small MVT library, we analyze its usage in order to justify the potential of using larger MVT libraries for leakage reduction.

Column I in Table 3 shows the number of cells after we map each design to {LVT,HVT,MVT} library. The next column shows the percentage of these cells that have MVT variants in the library. This indicates that on average 27% of the design is optimized by our MVT library. Among these cells that have MVT variants, on average 19% of them actually appeared as MVT variants in the circuit (column III). This shows that Design Complier has successfully selected the MVT library in synthesis stage.

Design# of

Cells Delay (ns)

Leak (uW)

(%) (%) (%)

C2670 1269 1.6 1.03 16.67 33.28 19.94 C3540 1669 3.2 1.42 22.84 26.06 4.18 C5315 2307 2.5 1.61 26.25 39.85 18.44 C6288 2416 8.5 2.57 13.88 16.61 3.18 C7552 3513 2.25 2.08 26.45 33.70 9.86 S9234 5808 2.3 0.98 18.32 22.93 5.64 s35932 12204 2 7.12 41.89 44.23 4.02 s15850 10306 3.2 3.26 46.98 60.68 25.84 s38417 23815 3.2 10.54 45.82 49.56 6.89 s13207 8589 2.75 2.84 64.45 65.38 2.61 s38584 20679 3.2 8.12 61.02 64.37 8.58 Design1 16838 4 30.2 19.45 24.3 6.02 Design2 30644 4.2 43.08 14.62 21.01 7.48 Design3 70966 13 85.57 63.58 64.2 1.94 Average 34.4 40.4 8.9

Page 6: [IEEE 2009 10th International Symposium on Quality of Electronic Design (ISQED) - San Jose, CA, USA (2009.03.16-2009.03.18)] 2009 10th International Symposium on Quality of Electronic

Yuan, Leakage Optimization Using Transistor-Level Dual…

Column IV is the number of cells when we map to the {LVT,HVT} target library. Column V lists the number of LVT cells in such design. Comparing to the number of LVT cells in the design with {LVT,HVT,MVT} library, we see a 17% of LVT usage reduction (column VI). We notice a good correlation between this and the leakage reduction improvement by adding MVT to {LVT,HVT} (the last column in Table 2). For example, s15850 has the largest reduction in LVT cell usage (34%) which results in the largest leakage improvement (25.84%).

Table 3: Breakdown of cell usage in the netlist.

Finally, we argue that the 9% leakage reduction

improvement report in Table 2 is significant. This is because that we achieve this by adding a 40-cell MVT library to the standard {LVT,HVT} library with 1180 cells. The {LVT,HVT} has utilized the full advantage of gate level dual Vth design since each LVT cell has a HVT cell and 100% of the design can be optimized for design. In the case of {LVT,HVT,MVT} design, cells that have MVT variants count only 27% of the design (column II in Table 3), which means that 73% of the design have not been optimized due to the limitation of the MVT library.

5. Conclusions We present a methodology to build MVT library and

integrate it into the dual Vth design flow with existing EDA tools. Experiments under industrial design environment demonstrate the effectiveness of this approach. By adding a 40-cell MVT library into the 1180-cell LVT and HVT library, we achieve an average of 9% more leakage saving by optimizing only 27% of the design.

6. Acknowledgment

This work is partially supported by grant CNS0615222 from the National Science Foundation.

7. References [1] A. Abdollahi, F. Fallah, and M. Pedram, “Leakage

Current Reduction in CMOS VLSI Circuits by Input Vector Control”, IEEE Trans. VLSI, vol. 12, pp. 140-154, Feb. 2004.

[2] P. Gupta, A. B. Kahng, P. Sharma, "A Practical Transistor-Level Dual Threshold Voltage Assignment Methodology", in Proc. of ISQED 2005, pp. 421-426.

[3] V. Khandelwal, A. Davoodi, A. Srivastava, “Simultaneous Vt Selection and Assignment for Leakage Optimization”, IEEE Trans. on VLSI, pp. 762-765, Vol 13, June 2005.

[4] M. Ketkar and S. Saptnekar. “Standby Power Optimization via Transistor Sizing and Dual Threshold Voltage Assignment,” in Proc. of ICCAD 2002, pp. 375-378.

[5] J. Kim and Y. Shin, “Minimizing Leakage Power in Sequential Circuits by using Mixed Vt Flip-Flop”, in Proc. of ICCAD 2007, pp. 797-802.

[6] T. Kuroda, et al, “A 0.9V 150MHz 10mW 4mm2 2-D Discrete Cosine Transform Core Processor with Variable Threshold-Voltage(VT) Scheme”, IEEE Journal of Solid-State Circuits, pp. 1770-1779, Nov 1996.

[7] S. Mutoh, T. Douskei, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, “1-V Power Supply High-Speed Digital Circuit Technology with Multi-threshold Voltage CMOS”, IEEE Journal of Solid-State Circuits, vol.30, No.8, pp. 847-854, Aug. 1995.

[8] S. Narendra, D. Blaauw, A. Devgan and F. Najm, “Leakage Issues in IC Design: Trends, Estimation and Avoidance,” in Proc. of ICCAD 2003, tutorial.

[9] K. Shi, and D. Howard, “Challenges in Sleep Transistor Design and Implementation in Low-Power Designs,” in Proc. of DAC 2006, pp. 113-116.

[10] F. Sill, F. Grassert, and D. Timmermann, “Low Power Gate-Level Design with Mixed-Vth (MVT) Techniques,” In. Proc. ACM SBCCI, pp. 278-282, 2004.

[11] S. Sirichotiyahul, T. Edwards, C. Oh, J. Zuo, A. Dharchoudhury, R. Panda, and D. Blaaus, “Stand-by Power Minimization through Simultaneous Threshold Voltage Selection and Circuit Sizing,” Proc. of DAC 1999, pp. 436-441.

[12] A. Srivastava, D. Sylvester, and D. Blaauw, “Power Minimization using Simultaneous Gate Sizing, Dual-Vdd and Dual-Vth Assignment”, in Proc. of DAC 2004, pp. 783-787.

[13] Q. Wang and S.B.K. Vrudhula, “Algorithms for Minimizing Standby Power in Deep Submicronmeter Dual-Vt CMOS Circuits”, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, pp. 306-318, Vol. 21, No. 3, Mar. 2002.

[14] L. Wei, Z. Chen, M. Johnson, and K. Roy, “Design and Optimization of Low Voltage High Performance Dual Threshold CMOS Circuits”, Proc. of DAC 1998, pp. 489-494.

[15] L. Wei, Z. Chen, K. Roy, Y. Ye and V. De, “Mixed-Vth CMOS Circuit Design Methodology for Low Power Applications,” in Proc. of DAC 1999, pp. 430–435.

[16] Y. Ye, S. Borkar and V. De, “A New Technique for Standby Leakage Reduction in High-Performance Circuits,” in Proc. of ISVLSI 1998, pp. 40–41.

[17] Synopsys Power Compiler User Guide, Version Z-2007.03, March 2007.

[18] Atmel Corporation.

Design #cell %MVTa %MVTu #cells #LVT %LVT red.

C2670 855 39% 15% 827 364 30% C3540 1019 21% 25% 998 510 11% C5315 1378 25% 24% 1271 554 30% C6288 2188 33% 37% 2105 997 17% C7552 1511 28% 19% 1507 760 19% s9234 874 25% 27% 864 429 13% s35932 7305 16% 10% 7369 1797 10% s15850 2661 30% 13% 2619 850 34% s38417 6835 26% 20% 6956 2598 15% s13207 2290 26% 10% 2279 453 -8% s38584 8569 29% 9% 8860 1622 24% Average 27% 19% 17%