on gate level power optimization using dual-supply ...€¦ · chunhong chen, member, ieee, ankur...

14
616 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001 On Gate Level Power Optimization Using Dual-Supply Voltages Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—In this paper, we present an approach for applying two supply voltages to optimize power in CMOS digital circuits under the timing constraints. Given a technology-mapped net- work, we first analyze the power/delay model and the timing slack distribution in the network. Then a new strategy is developed for timing-constrained optimization issues by making full use of slacks. Based on this strategy, the power reduction is translated into the polynomial-time-solvable maximal-weighted-indepen- dent-set problem on transitive graphs. Since different supply voltages used in the circuit lead to totally different power con- sumption, we propose a fast heuristic approach to predict the optimum dual-supply voltages by looking at the lower bound of power consumption in the given circuit. To deal with the possible power penalty due to the level converters at the interface of different supply voltages, we use a “constrained F-M” algorithm to minimize the number of level converters. We have implemented our approach under SIS environment. Experiment shows that the resulting lower bound of power is tight for most circuits and that the predicted “optimum” supply voltages are exactly or very close to the best choice of actual ones. The total power saving of up to 26% (average of about 20%) is achieved without degrading the circuit performance, compared to the average power improve- ment of about 7% by gate sizing technique based on a standard cell library. Our technique provides the power-delay tradeoff by specifying different timing constraints in circuits for power optimization. Index Terms—Dual-supply voltages, gate level, maximal- weighted-independent-set, power optimization, timing con- straints. I. INTRODUCTION W ITH the increasing demand for low-power applications, power optimization has been a major goal in designing digital circuits. Since the dynamic power dissipation of CMOS circuits is proportional to the square of supply voltage, reducing the supply voltage promises to be one of the most effective ways to achieve low power. Unfortunately, the reduced supply voltage leads to the speed loss of the logic modules. This sit- uation deteriorates especially as the supply voltage approaches the threshold voltage of the device [1]. To compensate for the reduced speed, one can use parallel and pipelined architectures with the expensive hardware overhead [8]. Alternatively, the speed degradation problem can be overcome by using the low- threshold voltage which, however, results in a rapid increase in Manuscript received August 4, 1999; revised February 13, 2001. This work was supported in part by the National Science Foundation (NSF) under Grant MIP-9527389, by DARPA, and by a Grant from Motorola. C. Chen is with the Department of Electrical and Computer Engineering, Uni- versity of Windsor, ON, Canada. A. Srivastava and M. Sarrafzadeh are with the Computer Science Department, University of California at Los Angeles, Los Angeles, CA 90095-1596 USA. Publisher Item Identifier S 1063-8210(01)03643-5. the subthreshold current [9], [20]. In either case, an implicit as- sumption is that the reduced supply voltages are uniformly ap- plied to all logic modules. Instead, the entire circuit performance does not necessarily get worse if the reduced supply voltages are applied only to the logic modules on the noncritical paths. The reason is that these modules typically have high timing slack which may allow the increase of their delay without violating the timing constraints. Experience shows that, for most circuits, logic modules (or gates, at logic level) on critical paths only ac- count for a small portion of all modules. In Fig. 1, for example, we plot the average distribution of gates with different slack for 16 MCNC91 benchmarks after technology mapping (note that the slack value has been normalized to the longest path delay). It can be seen from this figure that, the number of gates on critical paths (i.e., with zero or close-to-zero slack) accounts for only about 14% of total gates, while more than 60% of gates have their slack larger than 0.2. This potentially provides much room for power reduction using the reduced supply voltage. In general, power reduction technique with the reduced voltage (or variable voltage) is called voltage scaling. De- pending on how many supply voltages of different value are available, voltage scaling may be classified as multiple voltage approach or dual-voltage approach. Prior work on multiple voltage approach includes [4], [12], [13], and [25]. Most of them basically focus on the scheduling problem for low power at the behavioral level. For example, [12] proposes an optimal scheduling algorithm to reduce The systems power while meeting the timing constraint. In [13], a dynamic programming technique is used to minimize the average energy consumption. However, there are some practical hurdles that need to be over- come before the use of multiple supply voltages. Among them are constrained physical design problems and the resulting area/delay/power penalty due to level converters (LCs), which are required at the interface of different voltages. In contrast, these issues can be eased if only two supply voltages are used [2], [3], [5], [15], [26]. In [3], for example, a layout scheme using two voltages is discussed together with its application to a media processor chip design. In [15], dual-supply voltages were used successfully to design a chip of MPEG4 codec core at Toshiba Corporation. These show the feasibility of implementing dual-voltage circuits. In [2], Usami and Horowitz first proposed a dual-voltage ap- proach based on the so-called cluster-voltage scaling (CVS). The idea is to use the depth-first search from primary outputs to find gates which may operate at low supply voltage without violating the timing constraints. As a result, two clusters are created. One is the cluster consisting of gates with low supply voltage ( ) on the side of primary outputs, and the other is the 1063–8210/01$10.00 © 2001 IEEE

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

616 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

On Gate Level Power Optimization UsingDual-Supply Voltages

Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE

Abstract—In this paper, we present an approach for applyingtwo supply voltages to optimize power in CMOS digital circuitsunder the timing constraints. Given a technology-mapped net-work, we first analyze the power/delay model and the timing slackdistribution in the network. Then a new strategy is developedfor timing-constrained optimization issues by making full use ofslacks. Based on this strategy, the power reduction is translatedinto the polynomial-time-solvable maximal-weighted-indepen-dent-set problem on transitive graphs. Since different supplyvoltages used in the circuit lead to totally different power con-sumption, we propose a fast heuristic approach to predict theoptimum dual-supply voltages by looking at the lower bound ofpower consumption in the given circuit. To deal with the possiblepower penalty due to the level converters at the interface ofdifferent supply voltages, we use a “constrained F-M” algorithmto minimize the number of level converters. We have implementedour approach under SIS environment. Experiment shows that theresulting lower bound of power is tight for most circuits and thatthe predicted “optimum” supply voltages are exactly or very closeto the best choice of actual ones. The total power saving of up to26% (average of about 20%) is achieved without degrading thecircuit performance, compared to the average power improve-ment of about 7% by gate sizing technique based on a standardcell library. Our technique provides the power-delay tradeoffby specifying different timing constraints in circuits for poweroptimization.

Index Terms—Dual-supply voltages, gate level, maximal-weighted-independent-set, power optimization, timing con-straints.

I. INTRODUCTION

W ITH the increasing demand for low-power applications,power optimization has been a major goal in designing

digital circuits. Since the dynamic power dissipation of CMOScircuits is proportional to the square of supply voltage, reducingthe supply voltage promises to be one of the most effectiveways to achieve low power. Unfortunately, the reduced supplyvoltage leads to the speed loss of the logic modules. This sit-uation deteriorates especially as the supply voltage approachesthe threshold voltage of the device [1]. To compensate for thereduced speed, one can use parallel and pipelined architectureswith the expensive hardware overhead [8]. Alternatively, thespeed degradation problem can be overcome by using the low-threshold voltage which, however, results in a rapid increase in

Manuscript received August 4, 1999; revised February 13, 2001. This workwas supported in part by the National Science Foundation (NSF) under GrantMIP-9527389, by DARPA, and by a Grant from Motorola.

C. Chen is with the Department of Electrical and Computer Engineering, Uni-versity of Windsor, ON, Canada.

A. Srivastava and M. Sarrafzadeh are with the Computer Science Department,University of California at Los Angeles, Los Angeles, CA 90095-1596 USA.

Publisher Item Identifier S 1063-8210(01)03643-5.

the subthreshold current [9], [20]. In either case, an implicit as-sumption is that the reduced supply voltages are uniformly ap-plied to all logic modules. Instead, the entire circuit performancedoes not necessarily get worse if the reduced supply voltages areapplied only to the logic modules on the noncritical paths. Thereason is that these modules typically have high timing slackwhich may allow the increase of their delay without violatingthe timing constraints. Experience shows that, for most circuits,logic modules (or gates, at logic level) on critical paths only ac-count for a small portion of all modules. In Fig. 1, for example,we plot the average distribution of gates with different slack for16 MCNC91 benchmarks after technology mapping (note thatthe slack value has been normalized to the longest path delay). Itcan be seen from this figure that, the number of gates on criticalpaths (i.e., with zero or close-to-zero slack) accounts for onlyabout 14% of total gates, while more than 60% of gates havetheir slack larger than 0.2. This potentially provides much roomfor power reduction using the reduced supply voltage.

In general, power reduction technique with the reducedvoltage (or variable voltage) is calledvoltage scaling. De-pending on how many supply voltages of different value areavailable, voltage scaling may be classified asmultiple voltageapproach ordual-voltageapproach. Prior work on multiplevoltage approach includes [4], [12], [13], and [25]. Most ofthem basically focus on the scheduling problem for low powerat the behavioral level. For example, [12] proposes an optimalscheduling algorithm to reduce The systems power whilemeeting the timing constraint. In [13], a dynamic programmingtechnique is used to minimize the average energy consumption.However, there are some practical hurdles that need to be over-come before the use of multiple supply voltages. Among themare constrained physical design problems and the resultingarea/delay/power penalty due to level converters (LCs), whichare required at the interface of different voltages. In contrast,these issues can be eased if only two supply voltages are used[2], [3], [5], [15], [26]. In [3], for example, a layout schemeusing two voltages is discussed together with its application toa media processor chip design. In [15], dual-supply voltageswere used successfully to design a chip of MPEG4 codeccore at Toshiba Corporation. These show the feasibility ofimplementing dual-voltage circuits.

In [2], Usami and Horowitz first proposed a dual-voltage ap-proach based on the so-called cluster-voltage scaling (CVS).The idea is to use the depth-first search from primary outputsto find gates which may operate at low supply voltage withoutviolating the timing constraints. As a result, two clusters arecreated. One is the cluster consisting of gates with low supplyvoltage ( ) on the side of primary outputs, and the other is the

1063–8210/01$10.00 © 2001 IEEE

Page 2: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 617

Fig. 1. The average distribution of gates with different slack for 16 benchmarkcircuits.

cluster of gates with high-supply voltage ( ) on the side ofprimary inputs. Thus, no LCs are required. In this cluster struc-ture, however, any gate may be selected to operate atonlyafter all its transitive fanouts have been selected to do so. Thus,some part of the circuit with high slack may be left to operateunnecessarily at , limiting the potential of further power re-duction. To avoid this problem, an extended-CVS structure wasfirst proposed in [5] using the so-calledlevel sort techniqueandwas improved recently in [18]. In this structure, gates maybe scattered among the gates. However, because of lackof a global view, the method may not be effective especiallywhen the given timing constraints are tight. In addition, thesetechniques did not consider the switching activity informationwithin the circuit. Clearly, as long as the switching activity ofall nodes in the circuit is available, the more power saving canbe reached by allowing logic gates with high switching activityto operate at . In this sense, the CVS-based structure is notpreferable for power reduction, since the nodes near primaryoutputs typically have low switching activity (on average) [23].More recently, a linear programming approach is presented in[19] to address dual-voltage problem. However, it is based onthe so-calleddelay balanced graphwhose number increases ex-ponentially in terms of problem size.

No matter what specific algorithm is to be used, differentvalue of two supply voltages can lead to totally different powerconsumption. This gives rise to another important problem withdual-voltage approach: How to select the value of two supplyvoltages for maximum power reduction? Existing techniques donot deal with this problem. Since the optimum supply voltagedepends on specific circuits, it is straightforward to try a varietyof possible voltages for each given circuit and select the bestone of them. However, this exhaustive strategy is computation-ally expensive. A fast prediction algorithm is thus desirable toprovide a good starting point toward power optimization withdual voltages.

In this paper, we address the problem of reducing the totalpower consumption by using two supply voltages ( and

) at gate level. Given a technology-mapped network, thepower consumption can be modeled more accurately. We an-alyze the timing slack distribution and explore how to take fulladvantage of slacks in a given circuit. In order to get the max-imum power reduction while maintaining the timing perfor-

mance of the circuit, we relate the power optimization to themaximal-weighted-independent set (MWIS) problem and pro-pose a fast heuristic algorithm to predict the optimum supplyvoltage [21]. Then, based on predicted supply voltages, we de-velop an effective algorithm to allow as many gates as pos-sible working at . Considering the possible power penaltyof the LCs at the interface of and , we target min-imizing the number of LCs by using what we call the “con-strained Fiduccia–Mattheyses” [6] algorithm. By specifying dif-ferent timing constraints, the proposed technique is also ableto provide the power-delay tradeoff with two supply voltages[22]. The design flow of power optimization with dual voltagesis shown in Fig. 2 (other layout structures can be found in [16]).

II. BACKGROUND

A technology-mapped network can be represented as adi-rected acyclic graph . A node correspondsto a gate in the network (The terms “gate” and “node” will beused interchangeably throughout the paper). The existence ofa directed edge implies that node is an imme-diate faninof node (or, node is animmediate fanoutof node

). The set of all immediate fanins (fanouts) ofis denoted by( ). If there is a directed path from nodeto node

in , ( ) is said to be atransitive fanin (fanout)of ( ).The set of all transitive fanins (fanouts) of nodeis denoted by

( ). Each node is associated with a delay, where is theintrinsic delay, is a con-

stant dependent on the driving ability of the node, and isthe loading capacitance at the output of node. Thearrival time

andrequired time of node are recursively given by

(1)

respectively, and theslack timeof node is defined to be.

With good accuracy [1], the node delay atsupply voltageis proportional to , where is thethresholdvoltage, and is a constant (note that the slew effect is not con-sidered here). The delay of nodeat any supply voltage canbe expressed as

(2)

where is the delay of node at . The delay of nodeat is expressed as . The delay

increase due to the voltage reduction from to is thus. We call thedelay gapof node .

In our two-voltage approach, each node works at eitheror . If node works at , in (1) is replaced

with for the calculation of arrival time, required time andslack time.

In CMOS circuits, the average dynamic power consumptionis given by

(3)

Page 3: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

618 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

Fig. 2. Design flow with dual-voltage technique.

whereclock frequency;again the loading capacitance;switching activity at the output of gate.

In general, switching activity inside a circuit not only dependson the topologic structure and input patterns of the circuit, butmay vary with gate delay which introducesglitching transi-tions. Here, however, does not account for glitch power.The reason is twofold. First, measuring the glitches requires anevent-driven simulation and an explicit knowledge of the inputswaveforms. The later hypothesis is unrealistic, and updating theamount of glitching with simulation iteratively is computation-ally too expensive. Secondly, as we will see in the next sec-tion, the proposed technique tends to reachpath balancingbyreducing the slack without changing the circuit topology. Thishelps eliminate glitching to some extent. In any case, glitchpower can be taken into account during validation by power sim-ulation.

Reducing supply voltage results in power saving of the gate.For gate , changing its supply voltage from to any supplyvoltage generates the power reduction

(4)

When , the power reduction is. We call the power gainof gate .

For the purpose of further discussion in the following, we givesome definitions first.

Definition 2.1: A circuit (or graph ) is safeif foreach node . Otherwise, the circuit isunsafe, meaning thatit violates the timing constraints.

Definition 2.2: A node is called afeasible nodeif, where is the delay gap of node. Otherwise, the

node is called anonfeasiblenode.It is assumed that initially each node works at

and its slack (i.e., the circuit is safe). Under the given

timing constraints, any node may be selected to work atforpower reduction only if it is a feasible node. In general, however,not all feasible nodes in can work at . The reason is that,once a node is selected to work at , the slack of others (suchas the transitive fanins/fanouts of the node) may be reduced.As a result, some feasible nodes may no longer be feasible (seeExample 2.1).

Example 2.1:Fig. 3 shows an example with six nodes, whereand ( ) are the slack and delay gap of node

, respectively. Initially, all nodes are feasible. If nodeisselected to work at , the new slack values for all nodes willbe , and , making nodeand nodes nonfeasible. Instead, if we first select node

to work at , the new slack for all nodes turns out to be:, , and , leaving

, and still feasible. These feasible nodes can be furtherconsidered for the possibility of working at . In this sense,

is a better choice than .This example shows that the order of selecting nodes should

be considered carefully so that their slacks can be fully exploitedfor more power saving. In the next section, we will discuss thisproblem in more details. For any given circuit, our goal is toselect a subset of gates working at such that the timingconstraints are satisfied and the total power gain is maximized.

III. T IMING SLACK ANALYSIS

In this section, we explore how to take full advantage of slacksin a circuit by examining the interaction between the node delayand slack changes. First, we provide some definitions.

Definition 3.1: Given a graph : 1) an edgeis calledsensitiveif either

or . Intuitively, a sensitive edge im-plies that the slack of and is sensitive to each others delaychange; 2) a directed path is calledsensitiveif: a) the path con-sists of only sensitive edges and b) the slack of all nodes on the

Page 4: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 619

Fig. 3. Part of a safe circuit with six feasible nodes.

path is monotonously distributed1 in the direction of the path;3) two nodes are calledslack sensitiveif there exists asensitive path from to or from to in . Otherwise, theyare calledslack insensitive.

Definition 3.2: The sensitive transitive closure graphof is a directed graph such that there is an edge

if and only if there is a directed sensitive pathfrom to in .

We clarify the above definitions using the following example.Example 3.1:Fig. 4 shows an example, where the delay of

node is 2, while the delay of the rest is assumed to be 1. The ar-rival, required and slack time for each node are represented by atriplet in the figure. From Definition 3.1, all edges ex-cept the directed edge are sensitive. The sensitive pathsinclude , , and . Nodes , , and

are pair-wise slack sensitive, while, are slack insensi-tive. The sensitive transitive closure graph of Fig. 4 is shown inFig. 5. Note that there is no edge from to in Fig. 5 since

in Fig. 4 is not a sensitive path. Instead, there is an ad-ditional directed edge in Fig. 5 since there is a directedsensitive path, , from to in Fig. 4.

Suppose the delay of nodeincreases by a small positiveamount , then its slack is reduced by exactly. Ob-viously, the slack of any node does notchange. Instead, the slack for any node

may or may not change, depending on and theslack sensitivity of and . Consider three cases as shown inLemma 3.1.

Lemma 3.1:Let be a transitive fanin/fanout node of anynode , and be a small positive constant. We have thefollowing.

Case a) If , then will decrease by as longas and are slack sensitive.

Case b) If , then will decreaseby as long as and are slack sensitive.

Case c) If either or , are slack insensi-tive, then remains unchanged.

Proof: For simplicity, we give proof of Case a) only.Without loss of generality, assuming in Casea). Since and are slack sensitive and , wehave , where is a node lying at the sensitivepath from to . Thus, we only need to prove that Case

1Assuming a directed path consists offv ; v ; . . . ; v g. The monotonicslack distribution on the path implies that ifs(v ) � s(v ), thens(v ) �s(v ); if s(v ) � s(v ), thens(v ) � s(v ), wherei = 1; . . . ; m� 1.

Fig. 4. Example graph with the arrival, required and slack time shown as atripletfa; r; sg, assumingd(v) = 2,d(u ) = 1, i = 1; 2; 3, andd(w ) = 1,j = 1; 2.

Fig. 5. The sensitive transitive closure graph,GGG of Fig. 4.

a) is true when is an immediate fanout of. In this case,we argue . [Otherwise, we have both

and . This implies. This is a contradiction.] Therefore, increasing the

delay of by results in the increase of both andby , while both and remain unchanged. This meansboth and will be reduced by . Similarly, one canprove that Case a) still holds true if .

Let us verify the above lemma using the following example.Example 3.2:Consider Fig. 4. Assume that the delay of node

, , increases by . Since , bothand are reduced to 0.2, as indicated in Case a). From Caseb), decreases to 0.2. Since, are slack insensitive,

remains unchanged, as can be expected from Case c). Incontrast, by using , we have

. In this case, increasing by does not affect the valueof , even though nodes, are slack sensitive.

To make full use of timing slacks, it is desirable to keep theminimum number of nodes whose slacks are reduced by the in-creased delay of a node. In this sense, Case c) is the best case.This motivates us to first select the set of nodes with max-imum slack (denoted by ) in and then construct anin-duced subgraph, , of (see Definition 3.2)on such that there is an edge if is in .We use to denote the set ofneighborsof node in .In Fig. 4, for example, we have and .

of Fig. 5 is shown in Fig. 6, where . Thereare two important properties of , as given inLemma 3.2.

Lemma 3.2:

Property 1) If the delay of any node increases by, then the slackof any node decreasesby.

Page 5: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

620 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

Fig. 6. The induced graphGGG of Fig. 4.

Property 2) The delay increase of any node bydoes not affect the slack of any node aslong as is small enough.

Proof: 1) Since , it follows thatand nodes , are slack-sensitive. From Case a), is reducedby as is. 2) Since , we have eitheror . If , then

. Let . If , thennodes and are slack-insensitive. In either case, remainsunchanged according to Case c).

One can demonstrate Lemma 3.2 using in Figs. 4–6.It can be seen that is independent of, and Lemma 3.2 holdstrue if , where is the secondlargest slack of all nodes in. Therefore, a reasonable value for

is . From Lemma 3.1, in a given graph, in-creasing the delay of any node may reduce the slacks of its tran-sitive fanin or fanout nodes, depending on their slack-sensitivity.

As will be seen in Section IV, the power consumption of anode can be represented as a monotonic function of its delay.Therefore, it is generally desirable to maximize the sum of pos-sible increased delay of all nodes in without violating thetiming constraints. To this end, one can select nodes in themax-imum independent node set2 (MINS) of for the delay in-crease by . Each time the delay of all nodes in MINS increasesby , we update , and , and find MINS again. Thisprocess continues until . Assuming that the circuitcontains groups of nodes with different slacks, the wholeMINS-based process can be completed by passes. Also,choosing guarantees that keeps safe ateach pass. This is because , ,and from Case a)–c) of Lemma 3.1, ,

. Example 3.3 illustrates the process.Example 3.3:Consider Fig. 4 again. Initially, we have

and . The MINS of (see Fig. 6)is (or ). Let MINS , and increase the delayof node by . We then update

, and as: , and. is also updated, as shown in Fig. 7

where MINS is or . Let MINSand increase the delay of both node and node by

. The process terminates withthe final maximum slack . As a result, the sum ofincreased delay of all nodes is thus , which isthe maximum achievable under the given timing constraints.

The above process of increasing delay is done by many itera-tions, depending on the number of nodes with different slacks inthe graph. From the optimization point of view, this constitutes a

2A maximum independent node set of a graph is a set of independent nodes(i.e., no two nodes are connected by an edge) such that the number of nodes inthis set is maximum.

Fig. 7. The updatedGGG .

first-order gradient approach, where the sum of incremental de-lays is the objective function to be optimized, the MINS corre-sponds to a fastest way that the objective is increased, andisthe step size used for searching the best solution. To speed up theprocess, one may wish to solve the problem in one shot by in-creasing the delay for all nodes in the graph. This can howeverprevent us from analyzing/formulating the slack-sensitivity be-tween nodes, as described previously in this section. Since slacksensitivity explains how the nodes interact in terms of their delayand slack changes, it is generally not realistic to find an exact so-lution in one shot where the slack sensitivity information is un-available.

IV. L OWER BOUND OF POWER CONSUMPTION

In the previous section, we obtained the basic scheme of se-lecting nodes for delay increase by analyzing the relation be-tween the slack and delay of nodes. The goal is to maximizethe sum of all possible delay increase for all nodes in the cir-cuit while maintaining the timing performance. However, max-imizing the sum of all possible delay increase does not neces-sarily lead to maximum power reduction. The reason is two-fold. First, the voltage reduction and, hence, power reductionunder a unit penalty of node delay can be different for differentnodes. Second, under dual-voltage environment, only two dis-crete supply voltages are available for all nodes. In this sec-tion, we deal with the former issue by introducing the notion ofweight functionand transforming the delay increase into powerreduction. The latter will be discussed in Section V.

A. Lower-Bound Algorithm

To tackle low-power design using the slack and delay ofnodes, let us examine how the node delay affects the powerconsumption. First, assume (for the purpose of establishing alower bound) that initially the given circuit is safe and that thesupply voltage can take any continuous value. From (2) and (4)at any supply voltage , the power reduction of nodeunderunit delay penalty is given by

(5)

We call the weight function of node. Note thatnot only depends on and , but also varies with the

Page 6: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 621

supply voltage and, hence, the delay of node. This leadsto the following considerations.

1) Since the weight function denotes the power reduction byincreasing a unit delay on a node, we need to select nodesin MWIS3 instead of MINS for the delay increase ofsothat the maximum power reduction can be obtained.

2) Since the weight function is related to the node delay, asmall value of is generally preferable. Theoretically,isrequired to approach zero for the maximum power saving.In the real world, however, smalleris not always better.First, too small value of can lead to prohibitively ex-pensive computation cost. Second, since MWIS dependson the relative weight of nodes in , the small changeof weight for specific nodes will not necessarily affectMWIS. This is true especially when the weights of nodesin are distributed in a wider range. In our experimentpart, we will show that for most circuits, the results arereasonably good when using .

In the above discussion, we assumed the supply voltage isa continuous variable. Instead, when only dual-discrete supplyvoltages are available, the actual power reduction is higher thanit is with continuous voltage. From this point of view, our ap-proach provides a lower bound of power consumption with dualvoltages. The procedure of our algorithm is outlined below:

Lower-Bound-Algorithm {input:output: supply voltages for all nodes and

lower bound of the total powerCalculate node delay, slack and weight

function for each node in underusing (1), (2) and (5);

Identify and in and let

while {Find and construct ;Find MWIS of ;Increase node delay by for each

node in MWIS;Update the node slack, voltage,

weight and ;}

Calculate the final supply voltage,, for each node in using (2);

Obtain the lower bound of power con-sumption using (4);}

It should be noted that the MWIS problem isNP-completeongeneral graphs [11]. It is, however, polynomial-time solvable fortransitive graphs [11]. A fast heuristic for finding MWIS is asfollows. Initially MWIS is set to be . Then choose a nodewith maximum among all nodes in . Addnode to MWIS and delete and all its neighbors (along with

3A maximal weighted independent set of a graph is a set of independent nodes(i.e., no two nodes are connected by an edge) such that the sum of their weightsis maximum.

all edges incident with at least one of the nodes) from. Thisprocess repeats until . An exact and polynomial-timealgorithm can be found in [11] and [14], respectively.

B. Prediction of “Optimum” Supply Voltages

It is interesting to look at the effect of low supply voltageon the delay and power reduction. For each gate, the reduced

promises more power saving at the cost of increased delayof the gate. Under the same timing constraints, using the lower

means that the fewer gates are permitted to work with.Therefore, one can expect there is a specific “optimal” valueof supply voltage for the total power reduction, depending onthe given circuits. Unfortunately, the existing dual-voltage al-gorithms work with the fixed supply voltages. These algorithmscannot help if the given supply voltages are far from their op-timum. Also, as mentioned in Section I, an exhaustive approachfor selecting optimum supply voltages is computationally ex-pensive. This motivates us to find a fast prediction of “optimal”dual-supply voltages.

When the lower-bound algorithm terminates, the supplyvoltage is obtained for each node in . In order tomeet the given timing constraints, the supply voltage, ,selected for node has to be kept greater than . Underdual-voltage environment, either or can be used foreach node. It is reasonable to select .The “optimal” value of can be estimated by finding

whereif

if

(6)

This can be done by calculating the specific summation in(6) for different values of ranging fromto and selecting the maximum sum. Equation(6) cannot take its maximum value unless is equal to

for a specific node . We prove this below by way ofcontradiction. Without loss of generality, suppose we have

and (6) takes its maximum value

when , where . Notethat for , and for . On the otherhand, when , the value of (6) becomes

Since , we have . This contradicts theprevious hypothesis that is the maximum value.

The estimate given by (6) isoptimalin the sense that any devi-ation from it will either increase power consumption or violatethe timing constraints. When the optimum and areavailable, we modify lower-bound algorithm for dual-voltagepower optimization as follows. Each time the delay of nodes in

Page 7: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

622 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

MWIS increases by, we check if some of these nodes are ableto work at . If yes, select them to work at , and updatetheir weight to zero. The reason is that, once a node works at

, no further increase of its delay is needed. By doing so,more delay budget can be provided for other nodes. The formaldescription of power optimization algorithm based on predictedoptimum supply voltages will be given in the next section.

V. POWER OPTIMIZATION WITH DUAL-SUPPLY VOLTAGES

In this section, we describe an effective algorithm for poweroptimization using dual-supply voltages. Since a level converteris required at the interface of two voltages, we also discuss thereduction of the level converters’ power penalty.

A. Power Optimization Algorithm Using Dual-Supply Voltages

Before presenting our algorithm, we take a look at the perfor-mance of existing techniques for power optimization with twosupply voltages. Example 5.1 shows how these techniques workon Fig. 3.

Example 5.1:One of existing techniques is CVS scheme asmentioned in Section I. Assuming, in Fig. 3, thatis a primaryoutput and the power gain, , of is as fol-lows: , , ,and . The CVS algorithm [2] startswith selecting the primary output . Once works at ,the slack of its immediate predecessor,, changes from threeto zero. To meet the timing constraint, cannot work atand the algorithm terminates. The power reduction is two. An-other technique available iszero-slack-algorithm(ZSA) [10].The basic idea behind ZSA is that, at each step, it first findsthe nodes with minimum positive slack and then selects someof them to work at such that their slacks become zero (orsmall enough in this application). At the first step, since nodehas maximum power gain of three,is selected to work atwith the power gain of three. At the second step, nodes,and have the minimum positive slack of two. Because threeof them are nonfeasible, the algorithm stops with the final powerreduction of three.

As a matter of fact, however, the optimal solution of Fig. 3 isto select , , , and to work at . That will result in themaximum power reduction of six while keeping the circuit safe.Both CVS and ZSA break down because they select the nodeslocally (i.e., in a very greedy fashion) and ignore the interactionbetween the delay and slack changes of nodes. Thus, a moreeffective algorithm, with a global view, is required to tackle thisproblem.

Based on the discussions in Section IV, we next outline ouralgorithm for power optimization.

Dual-Voltage-Power-Optimization (DVPO)Algorithm {input:output: The set of gates,

Step 1 : Predict the “optimum” supplyvoltages by calling lower-bound-algo-rithm ( ), unless they have been givenby the users;

Step 2 : Let and ;Step 3 : while ( ) {

Find of the induced sub-graph, , of on a set of nodes withmaximum slack;

Increase node delay by for allnodes in MWIS;

if the delay of node is so largethat it is able to work at ,

, then;

Update the delay, slack , and theweight function of node , ;

endif}

}

We now apply the dual-voltage power optimization (DVPO)algorithm to Fig. 3, as shown in Example 5.2. For comparison,it is assumed that the supply voltages are the same as in CVSand ZSA algorithms and, hence, the prediction of andis not included in this example.

Example 5.2:Consider Fig. 3 again. The weight functionsfor all nodes are approximated as ,

1 6. We have , , ,, and . In the first iteration

of DVPO algorithm, , , ,, , MWIS , and

, . No nodes are put into since, . The slack values of all nodes

are updated as: , and .In the second iteration, , , ,

, MWIS , and, . The delay of nodes , and

are large enough to be put into, i.e., .The new slacks are: , ,and , and are updated to be zero.In the third iteration, , , ,

, MWIS , and, . Thus, node is added into .

Finally, the algorithm terminates with . As a result, we getthe optimal solution of with maximumpower reduction of six.

The time complexity of DVPO algorithm is analyzed as fol-lows.The timecomplexityof lower-bound-algorithmisbasicallythe same as that in Step 3 of DVPO. Finding the maximum valueof requires time, where is the number ofnodes in . Estimation of the “optimal” value of takestime, since at most values need to be tried for finding the max-imum sum in equation (6). Therefore, the prediction of optimumsupplyvoltagesjustneeds time.Calculatingthedelay,slack and weight function for all nodes takes time. Finding

also needs time. The MWIS of can be obtained intime because . Since MWIS , and

linear time is required to update the slacks and weight functionsof all nodes in and for each node MWIS,each iteration in Step 3 can be completed in time. There-fore, assuming the number of iterations is, the time complexity

Page 8: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 623

Fig. 8. The comparison ofK andn for 16 benchmarks. (K—the number ofgroups of nodes with different slack;n—the number of nodes.)

of the whole DVPO algorithm is . When selectingat each iteration, we have , where

is the number of groups of nodes with different slack. Our expe-rience shows for most circuits, making the computationefficient. Fig. 8 is the comparison of and for 16 MCNC’91benchmark circuits, where , on average.

B. Level Converter Power Minimization Algorithm

When using two supply voltages, the circuit requires levelconverters (LCs) at the interface of gates and gatesto block the static current which occurs if a gate drives a

gate [2]. A conventional LC circuit is shown in Fig. 9. Inthe above discussion, we did not account for the effect of the LC.Since an LC generates the additional area, delay, and dynamicpower to the circuit, the number of LCs should be minimized.From Fig. 9, the LC consists of 6 transistors and, hence, its areacan be seen as the area of a three-input NAND gate. In the DVPOalgorithm, the delay of LC (denoted by ) can easily be takeninto consideration by replacing with asthe delay gap of node if an LC is needed at the output of.In general, can be seen to be a constant dependent on thespecific technology (in our experiment, the of 0.5 ns is usedas in [2]). After the DVPO, we have two partitions of gates: onewith gates (denoted by ) and the other with gates(denoted by ). Assuming that the input capacitance of an LCis and that the switching activity at nodewhere the LC isinserted is , the additional power consumption due to theLC can be approximated by .

Now we examine the problem of reducing the total powerconsumption by moving a gate from to or from to .Initially, because of DVPO, it is less likely to make any move ofa gate from to which may violate the timing constraintsdue to the increased delay of the gate. Let us first look intothe possibility of moving a gate from to . Without loss ofgenerality, consider an-type gate with -type fanins and

-type fanouts. The possible power reduction resulting fromthe move of node from to is given by

(7)

where the first term is the reduced power due to LCs and thesecond term accounts for the increased power of the node it-

Fig. 9. The conventional level converter.

self moving from to . Thus, the problem of maximizingpower reduction can be solved by constrained CF–M algorithm.Theconstrainedmeans that a move is accepted only if it doesnot violate the given timing constraints. Although more effec-tive cluster-based F–M (likehMetis[24]) algorithms are avail-able for the general partitioning purposes, they cannot be usedin this application where nodes are required to move individu-ally without violating the timing constraints. Our algorithm forminimizing LCs power penalty is described in the following.

Level-Converter-Power-Minimization (LCPM)Algorithm {

input: Circuit with the initial parti-tion of and

output : Power optimized circuit withfinal partition

repeat; gain_sum ;

for each gate docompute gain ;choose such that

gain (gain );if the tentative move of from

to does not violate the timing con-straints then

;gain_sum gain_sum

gain ;endif

endforFind such that

gain_sum (gain_sum );Move the first gates from to

ifgain_sum ;

until gain_sum ;}

After the process of gate move fromto is complete, asimilar algorithm is applied to the possible move of gatefrom

to where the cost function is modified as

(8)

Page 9: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

624 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

Fig. 10. Examples of gate moves. (a) From L to H. (b) From H to L.

Fig. 10 shows typical examples of the possible gate moves. InFig. 10(a), gateL is to be moved from to so that the numberof LCs is minimized [see (7)]. In Fig. 10(b), in order to reducethe number of LCs, gateH is to be moved from to if itdoes not violate the timing constraints [see (8)]. Obviously, thetime complexity of LCPM algorithm is the same as that of thetraditional F–M algorithm [6] which needs time for eachpass, where is the number of terminals in the circuit.

Since the power overhead of LCs is not considered in lower-boundalgorithminSectionIV,our lowerboundestimateofpowerdissipation may not be tight, depending upon specific circuits.Ideally, one has to try every possible set of dual voltages exhaus-tively for power optimization before deriving the real optimumvoltages. Obviously, this is prohibitively costly. Also, it is toocomplicated, if not impossible, to deal with LCs and best supplyvoltages in an integrated problem formulation. Hence, we havedivided this difficult problem into two subproblems which canbe solved easier. We first estimate “optimum” voltages by lower-bound algorithm assuming no power overhead for LCs, and thenminimize the number of LCs by LCPM algorithm. As we will seein the next section, the lower bound of power consumption de-viates the actual minimum power by about 6% on average (thereaders are referred to Table I in Section VI).

VI. EXPERIMENTAL RESULTS AND DISCUSSIONS

We implemented our dual-voltage approach in the environ-ment ofSISpackage [7], as shown in Fig. 2, and tested it ona set of MCNC91 benchmark circuits. The technology-mappednetwork was obtained using SIS under the delay mode. We usedthe longest path delay of the minimum delay implementation asthe timing constraints in our algorithm. The original power con-sumption was estimated at supply voltage of 5 V and clock fre-quency of 20 MHz under 1m technology with the thresholdvoltage of 0.6 V. Our experiments are divided into five parts. Inthe first part, we estimated the lower bound of power consump-tion and optimum supply voltages, and compare the predictedoptimum voltage with the actual one. In the second part, wecompare our dual-voltage approach with gate sizing technique(using a standard cell library). The performance of our approach

TABLE ITHE LOWER BOUND OF POWER CONSUMPTION

and CVS is presented in the third part, and the power-delaytradeoff is provided in the fourth part. Finally, we show ourpower optimization results with different input statistics in thefifth part.

A. Lower Bound of Power and “Optimum” Supply Voltages

Table I shows the lower bound of power consumption usingand its comparison with actual minimum

power. For individual circuits, the lower bound ranges from54.5% to 77.9% of actual power consumption. On average, thelower bound accounts for about 66% of the actual power, com-pared to 71.5% for actual minimum power. To test the “op-timum” voltage estimation, we optimized the same circuits withDVPO algorithm using different values of . Table II liststhese test data. Depending on specific circuits, the “optimum”value of ranges from 2.2 V to 3.5 V (here only “optimum”

was given because is 5 V under the minimum delayconstraints). It can be seen that the maximum power saving wasachieved at a specific value of (as shown in shaded box ofTable II) which is exactly, or very close to, the estimated “op-timum” voltage (as shown in the last column of Table II). Al-though, for a few circuits (e.g., circuitfrg2), the estimate andactual optimum voltage are much different, the resulting powerreduction is nevertheless always very similar. Note that, in somecases (e.g., circuit9symml), the lower bound of power can bemuch smaller than the actual minimum power consumption. Thereason is that, for this circuit, the final supply voltages () re-sulting from lower bound algorithm scatter in a wider range be-cause of the very nature of the circuit.

To look at how the value of affects the performance of thelower-bound algorithm, we tested it on different fixed valuesof (note that implies it varies dynam-ically). Fig. 11 shows the resulting lower bound of power andCPU time (on a SUN SPARCstation 5 with 32 MB RAM) forthe same circuits as above. For most circuits (except one), asshown in Fig. 11(a), the lower bound varies within the range of3% for different , indicating its insensitivity to . In contrast,CPU time increases with decreased, as shown in Fig. 11(b).Our experiment shows the further reduction ofresults in tooexpensive computation cost only with almost the same lowerbound estimate. We confirm that it is reasonable to choose

Page 10: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 625

TABLE IITHE OPTIMUM SUPPLY VOLTAGE AND POWER REDUCTION USING DIFFERENTVALUES OFV (THE SHADED ENTRIES CORRESPOND TO THEACTUAL VOLTAGE

WITH MAXIMUM POWER REDUCTION)

(a)

(b)

Fig. 11. Performance of lower-bound algorithm on a set of circuits. (a) Lowerbound on power consumption. (b) CPU time.

in terms of both accuracy and efficiency of thealgorithm.

Also, we tested our algorithm under different timing con-straints. Fig. 12 shows the lower bound of power and optimumsupply voltage given the timing constraints of ,and , where is the minimum delay of the circuit.As can be expected, the lower bound of power decreases as thetiming constraints are relaxed. The “optimum” , however,basically stays at 5 V under various timing constraints. What isinteresting is that looser timing constraints do not always leadto the lower “optimum” value of , as shown in Fig. 12(b).

(a)

(b)

Fig. 12. The estimated lower bound of power consumption and “optimum”supply voltages under different timing constraints. (a) Lower bound on powerconsumption. (b) Optimum supply voltage.

This is because the timing penalty increases quickly when thesupply voltage of a gate is reduced to the smaller value.

B. Comparison of Our Dual-Voltage Approach and GateSizing Technique

In order to compare our dual-voltage approach with gatesizing technique, we implemented a gate sizing technique [17]which is also based on the MWIS. Table III summarizes theresults (with V and V) using a standardcell library. The average power reduction over tested circuits

Page 11: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

626 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

TABLE IIIPOWER REDUCTION (%) COMPARISON OFDUAL-VOLTAGE TECHNIQUE AND

GATE SIZING TECHNIQUE

is 6.9% for gate sizing and 19.6% for dual-voltage approach.When using the optimum supply voltage, more power savingis possible with dual-voltage technique. However, it shouldbe pointed out that for dual-voltage technique, there will besome power overheads at physical level, which could not beaccounted for at gate level. More recently, an approach thatuses gate sizing and dual-voltage techniques simultaneouslyfor low power has also been reported [17], [27].

C. Performance of Our Approach Compared With CVS

For comparison with CVS which is the existing dual-voltageapproach, we also implemented CVS under SIS environmentand tested it on benchmarks. Table IV shows the comparison ofour algorithm and CVS ( V and V wereused for this experiment). Columns 2, 3, and 4 of this table arethe number of gates in the circuit, the circuit delay and initialpower consumption (i.e., with supply voltage of 5 V), respec-tively, before running the algorithms. With our algorithm, thenumber of gates working at in the circuit, on average, ac-counts for 65.7% of the total number of gates. Individual per-centage ranges from 34.6% to 86.3%, depending on the specificcircuits. The average power penalty due to the LCs is about 5%.Our algorithm achieves a total power reduction of 19.6%, on av-erage, with the maximum reduction of 26.3% for circuit C880.In contrast, only 11.4% average power reduction is obtained byCVS. A typical instance is circuit9symmlwhich has only oneprimary output. Under the timing constraints of minimum cir-cuit delay, the slack of the primary output is zero and no powerreduction is possible by using CVS. Instead, our algorithm gets15.4% power improvement for this circuit. Note that all resultsin Table IV were achieved without degrading the timing perfor-mance of the circuit. This demonstrates the great potential ob-tained by our dual-voltage power optimization. The last columnof Table IV shows the CPU time running on a SUN SPARCsta-tion 5 with 32 MB RAM.

To see how the excessive slacks in the circuit contribute topower reduction, we conducted experiments on the slack changeof all gates resulting from our algorithm. Fig. 13 shows the dis-tribution histogram of gates over the slack value for two circuits:c8andi3. For circuitc8whose longest path delay is 13.2 ns, 45out of 130 gates work at . The number of gates with small

slack drastically increases after optimization. For circuiti3, incontrast, only 6 out of 310 gates work at and, hence, theslack change of the gates is much smaller, resulting in very lim-ited power saving (only 0.1%).

D. Power-Delay Tradeoff

To provide the power-delay tradeoff, we specified the dif-ferent timing constraints on the given circuits for our poweroptimization. As an example, the power-delay tradeoff curvesfor two circuits (C880andapex6) are shown in Fig. 14, where

and are the two extreme points. corresponds to theminimum timing constraint with relatively less power reduc-tion, while stands for relatively loose timing constraint withthe possible maximum of power reduction which is reachedwhen all gates of the circuit work at . Using Vand V, the maximal percentage of power reductionshould be theoretically %. However,since the primary inputs always work at , the actual min-imum is higher. As shown in Fig. 14, the actual minimum powerfor circuit C880andapex6are 62.8% and 66.1% of the initialtotal power, respectively, and the power consumption no longerdecreases when the delay (i.e., the given timing constraint) in-creases further beyond point.

In the above discussions, we are assuming the capacitiveloading of a gate remains the same whether it works ator . In the real design, however, the gate with hasless parasitic capacitance than that with , indicating thatour power estimation for gates tends to be pessimistic. Inthis sense, the actual power saving can be a little higher thanestimated here.

E. Power Savings Under Different Input Statistics

In the above experiments we assume that the input activityis 0.5 for all circuits. Considering the fact that the actual inputconditions may not be known, it is interesting to look at howsensitive our optimization results are to input statistics. Whilethe proposed technique does not require any specific input pat-terns for optimization, the weight function given by (5) doesvary with switching activity, , which depends on the inputstatistics. Thus, input activity affects the optimal dual voltages,final assignment of nodes to or and, hence, the poweroptimization results. For a case study, we use the random inputactivity ranging from zero to one for circuit9symml, and run ourdual-voltage optimization algorithms. The results are plottedin Fig. 15. Fig. 15(a) shows the curve of initial power con-sumption versus random input activity, and Fig. 15(b) showsthe power saving using our approach with random input ac-tivity. From Fig. 15(a), the power consumption of this circuitchanges a lot due to different input patterns, ranging from 295.1to 1367.2. From Fig. 15(b), however, the power reduction variesfrom 15.9% to 17.6%, showing that percentage power saving isnot sensitive to input statistics. This is a desirable feature of ourapproach. Our experiments with other benchmarks showed thesimilar feature. We believe that percentage power reduction isdominated by other factors (such as timing constraints and cir-cuit topology which affect the slack distribution) instead of inputstatistics.

Page 12: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 627

TABLE IVPERFORMANCE OFOUR ALGORITHM AND CVS ALGORITHM ON A SET OF BENCHMARK CIRCUITS

(a)

(b)

Fig. 13. Gate distribution over the slack value for (a) circuitc8and (b) circuiti3.

VII. CONCLUSION AND FURTHER WORK

In this paper, we targeted gate level power optimization withdual-supply voltages. We have shown that dual-voltage approachcan achieve significant power saving without degrading timingperformanceof thecircuit.Wedemonstrated that it is important topredict optimum supply voltages before using dual-voltage tech-nique and that the power penalty due to level converters can bewellcontrolled.Byspecifyingdifferenttimingconstraints,weob-tained a power-delay tradeoff with dual voltages.

In this work, we used a simpler delay model which does notaccount for interconnection delay and signal slew effect. Inter-

(a)

(b)

Fig. 14. Power-delay tradeoff for (a) circuitC880and (b) circuitapex6.

connection delay cannot be reasonably estimated until the phys-ical design phase, while the slew effect depends strongly on thespecific value of dual-supply voltages applied. In addition, thedelay given by (2) is based on the assumption that the inputvoltage of a gate at supply voltage is also . However,when a gate drives a gate, the delay of the gateis more accurately proportional to insteadof . Therefore, in this case, (2) tends to over-estimate the delay. With these issues in mind, further work isrequired to develop more accurate delay model with dual volt-ages. Besides, there is the additional noise due to cross-coupling

Page 13: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

628 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

(a)

(b)

Fig. 15. Power consumption for circuit9symmlunder different input statistics.(a) Initial power versus random input activity. (b) Power savings versus randominput activity.

between signals with different voltages. Noise checks are thusincreasingly necessary during layout.

More recently, we are combining gate level design with phys-ical-level for power optimization with dual-supply voltages.Since some important parameters (such as wiring capacitance)cannot be available at gate level, some kind of feedbackbetween logic and physical levels is required in order to knowthe exact effect of dual-voltage approach on physical level.In the logic level, dual-voltage technique gives an estimateof gate power reduction. In the physical level, we can usesimulated annealing with additional constraints for placementand routing followed by some post processing with an ultimategoal of reducing power consumption due to both gates andinterconnections [26]. Further work is under way.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers fortheir comments, which improved the quality of the paper.

REFERENCES

[1] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-powerCMOS digital design,”J. Solid-State Circuits, vol. 27, no. 4, pp.473–484, April 1992.

[2] K. Usami and M. Horowitz, “Cluster voltage scaling technique for low-power design,” inProc. Int. Symp. Low Power Design (ISLPD), DaraPoint, CA, Apr. 1995, pp. 3–8.

[3] M. Igarashiet al., “A low power design method using multiple supplyvoltages,” in Proc. Int. Symp. Low Power Electronics and Design(ISLPED), Monterey, CA, Aug. 1997, pp. 36–41.

[4] S. Raje and M. Sarrafzadeh, “Variable voltage scheduling,” inProc. Int.Symp. Low Power Design (ISLPD), Dara Point, CA, Apr. 1995, pp. 9–14.

[5] K. Usami et al., “Automated low power technique exploiting multiplesupply voltage applied to a media processor,” inProc. Custom IntegratedCircuit Conf. (CICC), Santa Clara, CA, May 1997, pp. 131–134.

[6] C. M. Fiduccia and R. M. Mattheyses, “A linear time heuristic for im-proving network partitions,” inProc. IEEE/ACM Design AutomationConference (DAC), Las Vegas, NV, June 1982, pp. 175–181.

[7] E. M. Sentovishet al., “SIS: A system for sequential circuit synthesis,”Univ. California, Berkeley, Tech. Rep. UCB/ERL M92/41, May 1992.

[8] A. P. Chandrakasan and R. W. Brodersen,Low-Power CMOS DigitalDesign. Norwell, MA: Kluwer, 1995.

[9] S. Mutohet al., “1-V power supply high-speed digital circuit technologywith multithreshold voltage CMOS,”IEEE J. Solid-State Circuits, vol.30, pp. 847–853, Aug. 1995.

[10] R. Nair, C. L. Berman, P. S. Hauge, and E. J. Yoffa, “Generation of per-formance constraints for layout,”IEEE Trans. Comput.-Aided Design,vol. 8, pp. 860–874, Aug. 1989.

[11] R. H. Mohring,Graphs and Orders: The Role of Graphs in the Theoryof Ordered Sets and Its Application, I. Rival, Ed. New York: D. Reidel,May 1984, pp. 41–101.

[12] S. Raje and M. Sarrafzadeh, “Scheduling with multiple voltages,”Inte-gration VLSI J. 23, pp. 37–59, 1997.

[13] J. M. Chang and M. Pedram, “Energy minimization using multiplesupply voltages,”IEEE Trans. VLSI Syst., vol. 5, pp. 1–8, Dec. 1997.

[14] D. Kagaris and S. Tragoudas, “Maximum independent sets on transi-tive graphs and their applications in testing and CAD,” inProc. Int.Conf. Computer-Aided Design (ICCAD), San Jose, CA, Nov. 1997, pp.736–740.

[15] K. Usamiet al., “Design methodology of ultra low-power MPEG4 codeccore exploiting voltage scaling techniques,” inProc. ACM/IEEE DesignAutomation Conf. (DAC), San Francisco, CA, June 1998, pp. 483–488.

[16] C. Yeh, Y. Kang, S. Shieh, and J. Wang, “Layout techniques supportingthe use of dual supply voltages for cell-based designs,” inProc.ACM/IEEE Design Automation Conf. (DAC), New Orleans, LA, June1999, pp. 62–67.

[17] C. Chen and M. Sarrafzadeh, “Power reduction by simultaneous voltagescaling and gate sizing,” inProc. Asia and South Pacific Design Automa-tion Conf. (ASPDAC), Yokohama, Japan, Jan. 2000, pp. 333–338.

[18] C. Yeh, M. Chang, S. Chang, and W. Jone, “Gate-level design ex-ploiting dual supply voltages for power-driven applications,” inProc.ACM/IEEE Design Automation Conf. (DAC), New Orleans, LA, June1999, pp. 68–71.

[19] V. Sundararajan and K. K. Parhi, “Synthesis of low power CMOS VLSIcircuits using dual supply voltages,” inProc. ACM/IEEE Design Au-tomation Conf. (DAC), New Orleans, LA, June 1999, pp. 72–75.

[20] L. Wei, Z. Chen, K. Roy, M. C. Johnson, Y. Ye, and V. K. De, “Designand optimization of dual-threshold circuits for low-voltage low-powerapplications,”IEEE Trans. VLSI Syst., vol. 7, pp. 16–23, Mar. 1999.

[21] C. Chen and M. Sarrafzadeh, “Provably good algorithm for low powerconsumption with dual supply voltages,” inProc. Int. Conf. Comput.-Aided Design (ICCAD), San Jose, CA, Nov. 1999, pp. 76–79.

[22] C. Chen and M. Sarrafzadeh, “An effective algorithm for gate-levelpower-delay tradeoff using two voltages,” inProc. Int. Conf. ComputerDesign (ICCD), Austin, TX, Oct. 1999, pp. 222–227.

[23] M. Nemani and F. N. Najm, “Toward a high level power estimation capa-bility,” IEEE Trans. Comput.-Aided Design, vol. 15, pp. 588–598, June1996.

[24] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevelhypergraph partitioning: Applications in VLSI design,” inProc.ACM/IEEE Design Automation Conf. (DAC), Anaheim, CA, June 1997,pp. 526–529.

[25] M. Sarrafzadeh and S. Raje, “Scheduling with multiple voltages underresource constraints,” inProc. Int. Symp. Circuits Syst. (ISCAS), Or-lando, FL, May 1999, pp. 350–353.

[26] A. Nayak, P. Banerjee, C. Chen, and M. Sarrafzadeh, “Power optimiza-tion issues in dual voltage design,” inProc. Int. Conf. Chip Design Au-tomation (ICDA), Beijing, China, Aug. 2000, pp. 99–105.

[27] A. Nayak, M. Haldar, P. Banerjee, C. Chen, and M. Sarrafzadeh, “Poweroptimization of delay constrained circuits,” J. VLSI Design—SpecialIssue Low Power System Design, to be published.

Page 14: On gate level power optimization using dual-supply ...€¦ · Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Abstract—

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 629

Chunhong Chen(M’99) received the B.S. and M.S. degrees in electrical engi-neering from Tianjin University, Tianjin, China, and the Ph.D. degree in elec-trical engineering from Fudan University, Shanghai, China.

From 1986 to 1996, he was with Zhejiang University of Technology,Hangzhou, China, as an Assistant/Associate Professor. From 1997 to 1998,he was a Research Associate at the Hong Kong University of Science andTechnology, Hong Kong. From 1999 to 2001, he was a Postdoctoral Fellow,first at Northwestern University, Evanston, IL, and then at the University ofCalifornia, Los Angeles. Since then, he has been an Assistant Professor at theUniversity of Windsor, ON, Canada. His current research interests includephysical layout, logic synthesis, timing analysis, and power optimization forintegrated circuits.

Ankur Srivastava (S’00) received the Bachelor of Technology degree in elec-trical engineering from the Indian Institute of Technology, Delhi, India, in 1998and the M.S. degree in computer engineering from Northwestern University,Evanston, IL, in 2000. He is currently working toward the Ph.D. degree at theComputer Science Department, University of California, Los Angeles, underthe guidance of Prof. M. Sarrafzadeh.

He has served as an Intern at Fujitsu Laboratories of America, Sunnyvale,CA, during the summer of 1999. His primary research interests include delayand power optimization in logic circuits, reconfigurable computing, and powerissues in sensor networks.

Majid Sarrafzadeh (M’87–SM’92–F’96) received the B.S., M.S., and Ph.D.degrees in 1982, 1984, and 1987, respectively, from the University of Illinois atUrbana-Champaign in electrical and computer engineering.

He joined Northwestern University, Evanston, IL, as an Assistant Professorin 1987. In 2000, he joined the Computer Science Department at University ofCalifornia at Los Angeles (UCLA). He has collaborated with many industries inthe past ten years, including IBM and Motorola and many CAD industries. Hecontributed toTheory and Practice of VLSI Design. He has published approx-imately 200 papers, is a Co-Editor ofAlgorithm Aspects of VLSI Layout(Sin-gapore: World Scientific, 1994), coauthor ofAn Introduction to VLSI PhysicalDesign(New York: McGraw Hill, 1996), and the author of an invited chapter intheEncyclopedia of Electrical and Electronics Engineeringin the area of VLSIcircuit layout. He is on the editorial board of theVLSI Design Journaland anAssociate Editor ofACM Transaction on Design Automation (TODAES). His re-cent research interests include embedded and reconfigurable computing, VLSICAD, and design and analysis of algorithms.

Dr. Sarrafzadeh received the NSF Engineering Initiation award, two Distin-guished Paper Awards in the International Conference on Computer-Aided De-sign (ICCAD), and the Best Paper Award in the Design Automation Conference(DAC) for his work in the area of physical design. He has served on the tech-nical program committee of numerous conferences in the area of VLSI Designand CAD. He has served as committee chair of a number of these conferences,including International Conference on CAD and International Symposium onPhysical Design. He was the General Chair of the1998 International Sympo-sium on Physical Design. He is an Associate Editor of IEEE TRANSACTIONS ON

COMPUTER-AIDED DESIGN.