architectural power estimation based behavior profilingarchitectural powerestimation based...

17
VLSI DESIGN 1998, Vol. 7, No. 3, pp. 255-270 Reprints available directly from the publisher Photocopying permitted by license only (C) 1998 OPA (Overseas Publishers Association) Amsterdam B.V. Published under license under the Gordon and Breach Science Publishers imprint. Printed in India. Architectural Power Estimation Based on Behavior Level Profiling SRINIVAS KATKOORI a’t and RANGA VEMURI b’* University of South Florida, Department of Computer Science & Engineering, 4202 E. Fowler Avenue, ENB 118, Tampa FL 33620-5399; bLaboratory for Digital Design Environments, Department of Electrical and Computer Engineering, 813 Rhodes Hall, Mail Location 30, University of Cincinnati, Cincinnati, Ohio 45221-0030 High level synthesis is the process of.generating register transfer (RT) level designs from behavioral specifications. High level synthesis systems have traditionally taken into account such constraints as area, clock period and throughput time. Many high level synthesis systems [1] permit generation of many alternative RT level designs meeting these constraints in a relatively short time. If it is possible to accurately estimate the power consumption of RT level designs, then a low power design from among these alternatives can be selected. In this paper, we present an accurate power estimation technique for register transfer level designs generated by high level synthesis systems. The technique has four main aspects: (1) Each RT level component used in high level synthesis is characterized for average switched capacitance per input vector. This data is stored in the RT level component library. (2) Using user-specified stimuli, the given behavioral description is simulated and event activities of various operators and carriers are measured. Then, the behavioral specification is submitted to the synthesis system and a number of alternative RTL designs meeting speed, space and throughput rate constraints are generated. (3) Event activity of each component in an RT level design is estimated using the event activities measured at the time of behavior level profiling and the structure of the RTL design itself. (4) The event activities so obtained are then used to modulate the average switched capacitances of the respective RT level components to obtain an estimate the total switched capacitance of each component. Detailed power estimation procedures for the three different parts of RTL designs, namely, data path, controller and interconnect are presented. Experimental results obtained from a variety of designs show that the power estimates are within 3%- 10% of the actual power measured by simulating the transistor level designs extracted from mask layouts. Keywords: High level synthesis, power estimation, behavioral profiling, register transfer level designs, low power *Corresponding author. tThis work was performed as part of the doctoral dissertation, when the author was at University of Cincinnati. 255

Upload: others

Post on 05-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

VLSI DESIGN1998, Vol. 7, No. 3, pp. 255-270Reprints available directly from the publisherPhotocopying permitted by license only

(C) 1998 OPA (Overseas Publishers Association)Amsterdam B.V. Published under license

under the Gordon and Breach SciencePublishers imprint.

Printed in India.

Architectural Power Estimation Basedon Behavior Level ProfilingSRINIVAS KATKOORIa’t and RANGA VEMURIb’*

University of South Florida, Department of Computer Science & Engineering, 4202 E. Fowler Avenue,ENB 118, Tampa FL 33620-5399;

bLaboratory for Digital Design Environments, Department of Electrical and Computer Engineering,813 Rhodes Hall, Mail Location 30, University of Cincinnati, Cincinnati, Ohio 45221-0030

High level synthesis is the process of.generating register transfer (RT) level designs frombehavioral specifications. High level synthesis systems have traditionally taken intoaccount such constraints as area, clock period and throughput time. Many high levelsynthesis systems [1] permit generation of many alternative RT level designs meetingthese constraints in a relatively short time. If it is possible to accurately estimate thepower consumption of RT level designs, then a low power design from among thesealternatives can be selected.

In this paper, we present an accurate power estimation technique for register transferlevel designs generated by high level synthesis systems. The technique has four mainaspects: (1) Each RT level component used in high level synthesis is characterized foraverage switched capacitance per input vector. This data is stored in the RT levelcomponent library. (2) Using user-specified stimuli, the given behavioral description issimulated and event activities of various operators and carriers are measured. Then, thebehavioral specification is submitted to the synthesis system and a number of alternativeRTL designs meeting speed, space and throughput rate constraints are generated. (3)Event activity of each component in an RT level design is estimated using the eventactivities measured at the time of behavior level profiling and the structure of the RTLdesign itself. (4) The event activities so obtained are then used to modulate the averageswitched capacitances of the respective RT level components to obtain an estimate thetotal switched capacitance of each component.

Detailed power estimation procedures for the three different parts of RTL designs,namely, data path, controller and interconnect are presented. Experimental resultsobtained from a variety of designs show that the power estimates are within 3%- 10%of the actual power measured by simulating the transistor level designs extracted frommask layouts.

Keywords: High level synthesis, power estimation, behavioral profiling, register transfer leveldesigns, low power

*Corresponding author.tThis work was performed as part of the doctoral dissertation, when the author was at University of Cincinnati.

255

Page 2: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

256 S. KATKOORI AND R. VEMURI

1. INTRODUCTION

Due to the increasing demand for portableapplications and the rapidly growing complexity,power consumption has become one of the mainissues in the realization of VLSI chips. There havebeen major efforts [2] to reduce the powerconsumption at all levels of abstraction in thedesign flow. Accurate power estimation techniquesare the key to the success of these efforts. Althoughaccurate power estimation is possible at the lowerlevels of abstraction, it is very time consuming.Hence, recently focus has shifted to the higherlevels of abstraction including register transfer(RT) level and above [3]. In this paper, we presenta power estimation technique for automaticallysynthesized RT level designs. This technique isbased on behavior level profiling.A high level synthesis system accepts a beha-

vioral specification written in a hardware descrip-tion language such as VHDL, a module library,and design constraints such as the area and delayconstraints. The module library consists of RTlevel modules such as adders, multipliers, registersand multiplexors. The output of the synthesissystem is a RT level design satisfying the userspecified constraints. The synthesis time is usuallyquite small compared to logic synthesis or layoutsynthesis. Hence, it is possible to synthesize manyconstraint-satisfying RT level designs in a rela-tively short time.RT level designs are composed of two interact-

ing parts: datapath and controller. The datapathconsists of execution units such as adders andmultipliers, storage units such as registers andRAMs, and interconnect units such as multi-plexors and buses. Since the structure of thedesign is known completely accurate powerestimation is feasible. In addition, since themodules are at a sufficiently high level of abstrac-tion such estimation should be time efficient. Atthe higher levels of abstraction such as thebehavioral level, accurate power estimation isdifficult due to the lack of sufficient implementa-tion detail. On the other hand, at lower levels of

abstraction, such as logic and layout levels, eventhough sufficient implementation detail is avail-able, estimation time is discouraging. Hence, weare motivated to estimate power at the RT level ofabstraction. For a given RT level design and for agiven set of input vectors, we estimate the totalcapacitance switched in the design. We use"power" and "switched capacitance" synony-mously. Our estimation technique is set in thecontext of a high level synthesis system known asthe Profile-Driven Synthesis System (PDSS).Our power estimation procedure requires the

following inputs: (1) A module library character-ized for the average intrinsic switched capacitance(ISC) per input vector. (2) Profile data for variouscarriers and operators in the data flow graph of thebehavioral specification. This data is obtained bysimulating the behavioral specification using user-specified stimuli. (3) A RT level design generatedby the synthesis system. (4) Binding information ofthe operators and carriers in the data flow graph tothe module instances in the RT level design.High level synthesis process introduces certain

RT level module instances such as temporaryregisters and multiplexors for which the profiledata is not known since these modules have nodirect correspondence with the operators andcarriers at the behavior level. Profile data for suchdata path units is derived using the profile data attheir inputs which in turn is obtained from theprofile data measured at the behavior level. Theswitched capacitance for each module instance isestimated as the product of its profile data and itsintrinsic switched capacitance obtained from themodule library. The total switched capacitance inthe datapath is the sum of estimated switchedcapacitances over all instances.The switched capacitance estimation for the

controller, assumed to be implemented as a PLA,is as follows: A parameterized PLA characteriza-tion table for average switched capacitance perclock cycle is obtained as explained in Section 5.Given the controller size, the switched capacitancefor the controller is estimated by determining theclosest point in the PLA table.

Page 3: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

ARCHITECTURAL POWER ESTIMATION 257

The power estimated for the entire design is thesum of the estimated switched capacitances of thedatapath and the controller. Experimental resultsshow that the power estimated in the RT leveldatapaths and controllers is within 15% of theactual power measured at the layout level.

Section 2 presents a brief survey of the powerestimation techniques at architectural and otherlevels of abstraction. Section 3 discusses variousissues involved in power estimation. Section 4discusses the concept of behavioral profiling.Section 5 discusses the module library character-ization and the PLA characterization for theaverage switched capacitance per unit vector.Section 6 discusses the power estimation techni-que. Section 7 presents the results obtained forseveral examples. Section 8 discusses the resultsand presents concluding remarks.

2. PREVIOUS WORK

Powell et al. [4] suggested Power Factor Approxi-mation (PFA) method for characterizing eachmodule in a module library consisting of func-tional blocks. The method provides different gateequivalent models for blocks such as multipliers,adders, etc. Each functional block is associatedwith a PFA proportionality constant and ahardware complexity constant. The PFA constantcaptures the intrinsic internal activity of themodule. Purely random inputs are applied whenderiving the PFA constant. The power dissipationin a chip is the sum of the power dissipation in allblocks of the chip. The power contributed by ablock in the chip is simply the product of the abovetwo constants and the block’s activity frequency.The activity frequency of a functional block is thefrequency at which the function is performed. InPowell et al. [5] present an algorithm level powerdissipation model for a class of DSP algorithmsknown as MA-based (Multiply-Add) DSP Algo-rithms. The major sources of power dissipation inMA-DSP systems were identified to be memoryoperations, computations and I/O operations.

Impact of the number of available processingelements, complexity of processing elements,memory organization and type of arithmetic onpower dissipation was discussed. This modelrelates power dissipation to high level algorithmicand architectural parameters.Chandrakasan et al. [6, 7] described a high-level

synthesis, system, HYPER-LP, which uses avariety of architectural and computational trans-formations to minimize power consumption inapplication-specific datapath-intensive CMOS cir-cuits.Landman et al. [8] presented a methodology for

low-power design-space exploration at the archi-tectural level. Black-box power models for thearchitectural-level components were generated [9]and used to estimate power while preserving theaccuracy of the gate or circuit level estimation. Thepower analysis tool was set in the context ofHYPER [10], a high level synthesis system. Thekey differences between our approach and Land-man’s approach are (1) our synthesis system,known as PDSS (Profile Driven Synthesis System)[11], is targeted towards control-dominated ASICapplications. The behavioral specifications cancontain complex control constructs such nestedloops, conditional and subprograms. On the otherhand, HYPER primarily targets mostly straight-line DSP-style specifications. (2) Our approach isbased solely on the behavioral profiling. Land-man’s estimation is based on behavioral profilingor RT level profiling. For large designs, with largeset of inputs, the latter approach is time consum-ing and hence design space exploration is difficult.(3) Our characterization of the module library isbased on purely random inputs, that is, UniformWhite Noise (UWN) model. Landman, on theother hand, proposed DBT (Dual Bit Type) modelto take into account the input activity. Our powerdissipation model based on UWN model is simplercompared to Landman’s and yet yields reasonablyaccurate estimates.Renu et al. [12] proposed a behavior level

power estimation technique based on a combina-tion of analytical and stochastic methods. Based

Page 4: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

258 S. KATKOORI AND R. VEMURI

on this, a design space exploration tool ispresented which is used to examine the effect ofdifferent design steps such as transformations andalgorithms. These techniques have also beenimplemented in HYPER synthesis environment[10].Anand et al. [14] present a behavioral synthesis

system known as Genesis, for synthesizing lowpower datapath intensive CMOS circuits. Duringthe allocation phase, (1) the physical capacitance isreduced by minimizing the number of functionalmodules, registers and multiplexors; and (2) thetransition activity for a given module is reduced byselecting a proper sequence of operations for thatmodule. The controller is optimized so as togenerate control signals which will reduce thetransition activity in the datapath. This is achievedby introducing don’t-cares in the state table of thecontroller. If a datapath module is idle for aparticular cycle, then the control signal drivingthat module is assigned a don’t-care, thus avoidingunnecessary clocking of the module. In Anand etal. [15] present a simulation-based method tomeasure intra- and inter-iteration effects of hard-ware sharing on switched capacitance. During thesimulation, information is gathered which is usedto formulate allocation as an ILP problem with thetotal switched capacitance in the datapath as theobjective function. The solution to the problemyields optimal allocation for the given model.A detailed discussion about power consumption

in CMOS digital designs can be found in [16].Techniques for low power operation are presentedwhich use the lowest possible supply voltagecoupled with architectural, logic, circuit, andtechnology optimizations. An excellent literaturesurvey on the power estimation techniques at thelogic and lower levels of abstraction can be foundin [17],

In [11, 18], we have proposed a behavior levelprofiling based technique to estimate switchingactivity and switching capacitance in a design. Theestimation is carried out in the scheduling andperfornance analysis phase of the synthesis. For agiven input specification, various schedules can be

generated satisfying the user given constraints. Theschedule with least estimated switching capaci-tance is further synthesized. The estimationtechnique adopts analytical approach at the designlevel and statistical approach for the modulelibrary characterization. One of the drawbacks ofthe approach is that the interconnect estimation issomewhat inaccurate at the scheduling level,resulting in inaccurate power estimation. In thepresent work, the estimation is at the RT Leveland is based on the behavior level profiling of theinput specification. Since the interconnect struc-ture is known completely, power estimation in theinterconnect is more accurate compared to thatobtained at the end of the scheduling step. In thepresent approach, the error in power estimator isin the range of 3% 10%.

3. ISSUES IN POWER ESTIMATION

In a CMOS digital circuit, the power consumed isgiven by the following equations [19, 16]"

econsumed Pswitching -+- Psc + Pleakage.eswitcling E * Ci * Vsupply

esc Isc * Vsupplyeleakage Ileakage * Vsupply

Pswitching is known as the switching component ofpower consumption which arises due to charging anode with a load capacitance of Ci and which isclocked at a frequency, fi. Psc, the short-circuitcomponent arises when the PMOS and NMOStransistors are switching simultaneously resultingin a short-circuit path from the voltage supply toground. For a very short period of time, current isdrawn from the voltage supply to the groundwhich results in power dissipation. Pleakage is dueto the leakage current,/leakage, which arises due tosubstrate injection and subthreshold effects.The dominant term is Pswitching. This term is

dependent on the architectural parameters and isrelatively amenable for estimation at higher levels

Page 5: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

ARCHITECTURAL POWER ESTIMATION 259

of abstraction. It is well-known that the staticpower consumption in digital CMOS circuits isnegligible compared to the dynamic power con-sumption. Hence Pleakag, which is static in natureis negligible. Pse can be kept within 15-20% ofeswitehing [20] by proper design methodology. Thus,it is sufficient to estimate Pswitching to estimate theaverage power consumed by a design.Dynamic power consumption is strongly depen-

dent on the stream of inputs applied to the circuit[17]. Without any information about the inputstream, it is impossible to accurately estimate thepower consumption of a design. Thus, for a powerestimation technique it is necessary to provideactual or statistical information about the inputbehavior.

Different power estimation techniques makedifferent assumptions about the input vectors.These techniques are based on statistical, stochas-tic, probabilistic, or analytical approaches. Forany technique two broad steps can be identified:(1) Characterization of the circuit components forpower and storing relevant information about thecomponents in the form of statistical models,parameterized tables, equations, etc. This isusually done only once for all the componentsused in the circuit. (2) Estimation of average powerfor a given design by combining the input behaviorinformation specific to a design with the modulelibrary information using a statistical, stochasticprobabilistic or analytical approach or a mix ofthese approaches.

In our approach for power estimation at RTlevel, the input vector behavior is indirectlyspecified by the user by providing a sequence oftypical input vectors, known as the profilingstimuli. These vectors denote typical usage of thedigital system being synthesized. These vectors areused to simulate the behavior level specificationduring which event activities of various behaviorlevel operators and carriers are monitored andrecorded. Collectively this information is known asthe profile data.For a given set of inputs to a digital circuit, the

capacitance switched in the circuit is a measure of

the power consumed by the circuit. We adopt thisindirect approach for power estimation. Thus, inthis paper, we use "power" and "switchedcapacitance" synonymously.The module library is precharacterized for

average switched capacitance per input vector asexplained in detail in Section 5. RT level designscontain three subunits: datapath, controller, andinterconnect. Detailed procedures to estimate theswitched capacitances in each of these units arepresented in Section 6.

4. BEHAVIORAL PROFILING

The concept of profiling a given program to gathervarious statistics is not new. A well-known tech-nique for measuring program performance is toinsert monitoring code into the program andexecute the modified program. Program profilingcounts the number of times each basic block isexecuted and the number of times each control-flow path is traversed. Profiling is widely used tomeasure instruction set utilization, identify pro-gram bottlenecks and estimate program executiontimes for code optimization [25, 26, 27, 28, 29].Techniques to inser monitoring code to optimallyand efficiently profile programs exist in theliterature [30, 31, 32].

Behavioral level profiling is similar to programprofiling. For profile data to make sense in case ofhigh level synthesis, one needs to understand thecorrespondence between the constructs (variables,operations, loops etc.) in the behavior representa-tion to elements in the resulting hardware. Under-standing this correspondence helps in determiningthe data to be gathered during profiling. Theprofiling strategy is mainly dependent on howdifferent synthesis tasks go about synthesizing thetarget design.

Consider the behavior description written inVHDL as shown in Figure 1. One possible RTLdata path synthesized from the specification is asshown in Figure 2. The correspondences betweenelements of the specification and the elements of

Page 6: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

260 S. KATKOORI AND R. VEMURI

(1) ENTITY toy IS(2) PORT(a, b IN INTEGER;(3) c OUT INTEGER);(4) END toy(s)(6) ARCHITECTURE foo OF toy IS(7) BEGIN(8) p: PROCESS(a, b)(!0) VARIABLE u, v INTEGER;(11) BEGIN(12) u := a+b;(13) v := a-b;(14) IF (a > b)

S THEN(16) c <= u;(17) ELSE(18) c <= v;(19) END IF;(20) END PROCESS(21 END foo

FIGURE A Behavioral Specification in VHDL.

FIGURE 2 A RTL Data path Synthesized from Specificationin Figure 1.

the RTL design is also shown by the line numberannotations. Each register is associated with acarrier in the description, for example, register acorresponds to the port a in the specification.The profile data obtained by behavioral profil-

ing should indicate the usage of different hardwareelements. For example, the profile data of anassignment statement in the behavioral descriptiongives an estimate of the excitation frequency of thecorresponding path in the hardware. In our

example, if line number (18) has a profile data of10, it means that the corresponding path from theoutput of subtracter through the multiplexor tothe input of the register c, is excited ten times.RTL designs generated by high level synthesis

systems contain temporary registers and intercon-nect units which have no direct correspondencewith constructs in behavior level specification.Profile data for such RTL components which donot explicitly appear in the specification has to becalculated by some indirect means.

In order to profile a behavioral specification theprofiler inserts monitoring code in the specifica-tion. This code typically declares, initializes andincrements various counters to measure varioustypes of event activity. The modified program isthen simulated to determine the profile data.

Behavior profiler takes the CDFG representa-tion of the specification and generates equivalentVHDL program with probes (counters and similarmonitoring variables) to gather various eventactivities. We need to profile the CDFG ratherthan the original specification since the CDFGrepresentation exposes all the operations andcarriers (edges in CDGF) that will be bound tohardware resources.The generated VHDL program is simulated

using input vectors called the profiling stimulisupplied by the user. Profiling stimuli shouldrepresent typical usage of the design beingsynthesized. Since profiling stimuli will decide theevent activity in the design, the user should takeextreme care in preparing this data. Some sugges-tions as how to prepare profiling stimuli fordifferent classes of designs are given in [11].For the given profiling stimuli, the profile data of

the specification constitutes the following informa-tion associated with the CDFG nodes and edges.The event activity of a CDFG node op is thenumber of times that node is executed and isdenoted by Eop. The transaction activity of an edgee is the number of times the edge is traversedduring the execution and is denoted by Te. Theevent activity of an edge e is the numbers of timesinput changed on the edge and is denoted by Ee.

Page 7: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

ARCHITECTURAL POWER ESTIMATION 261

Note that Ee< Te. Probes are inserted by theprofiler to measure the profile data.

5. POWER CHARACTERIZATIONOF RTL MODULES AND PLAs

5.1. Module Library Characterization

The RTL module library contains parameterizedmodules such as n-bit registers, n-bit adders and n-bit m-to-1 multiplexors. Modules are parameter-ized with respect to bit-width of each input and,where applicable, the number of inputs. For eachmodule in the library, its interface description,parameters such as area, delay and averageintrinsic switched capacitance (ISC) characteristicsare stored in the library. The area, delay and ISCcharacteristics are expressed as a function ofparameter variables such as bit-width, word lengthetc. and are in the form of either equations ortables. If the data cannot be fit into an equation,then it is stored as a table. For tables, linearinterpolation or extrapolation is assumed when-ever the parameter value is not available for agiven value of parameter variable.For a given library module, area, delay and ISC

values are determined by generating layouts fordifferent parameter values. Linear regression bythe method of least squares is used to find an.equation which determines the area, delay or ISCcharacteristic given the bit-width parameter value.If the standard error is too high, then the data isentered as a table assuming the use of linearinterpolation in between the data points. Determi-nation of area and delay parameters for layoutinstances is straightforward. Area can be directlymeasured from the layout and delay can bedetermined through simulation or a timing analy-sis programs such as Crystal [34]. Determinationof ISC which depends on input patterns is moreinvolved and is described below.We define the average intrinsic switched capaci-

tance (ISC) of a module instance as the averagecapacitance that is expected to switch when an

input event (change of logic values on the inputlines) takes place. ISC of a module instance isdetermined by extracting a switch level model fromits layout, simulating the switch level module usinga very long stream of randomly generated inputpatterns and monitoring the capacitance switchedper pattern, until convergence occurs as discussedbelow. The capacitance measurements are carriedout by IRSIM-CAP [37], which is a modifiedversion of IRSIM [38] switch level simulator forbetter capacitance measurements.

Let Ck be the total capacitance charged afterapplying k random input patterns without reini-tialization between successive patterns. Zk Ck/kdenotes the average capacitance per input patternafter applying k patterns. 6k-’lZk--Zk_l[/Zk_denotes variation in the average capacitancebetween the k-1 th and k th patterns. We continueto apply random input patterns until 6k remainsless than 0.001 over 1000 consecutive input patternapplications. At this point we say that the averageswitching capacitance estimation converged andaccept the value of Zk after the last input pattern isapplied. This value is the ISC of that instance ofthe module. Similar procedure is used to determinethe ISC of various instances of the module andresults are expressed as an equation or table.

Figure 3 shows the ISC characteristics of alibrary module. Figure 4 shows ISC plots with

FIGURE 3

eooo ;000 12000Input Patterns (k)

ISC Characteristic of a 16-bit Register.

Page 8: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

262 S. KATKOORI AND R. VEMURI

45

Register,2-input Mux "/-

16Bitwldth

FIGURE 4 ISC vs. Bit-Width for three ParameterizedModules.

respect to the bit-width for three modules, namely,adder, register and two-input multiplexor. Table Ishows the ISC characteristics of some PDSSlibrary modules. For RAM component, there aretwo parameters namely, select size and the wordsize. The ISC value shown for RAM is the averagecapacitance.switched for either a Read or a Writeoperation.

5.2. PLA Characterization

The controller is a finite-state machine which weassume is implemented as a PLA structure. ThePLA structure consists of an input plane, an

output plane, and I/O buffers. We assume that thePLA is implemented using dynamic CMOS withpre-charged product and output lines [19]. Theproduct and output lines are selectively dischargedbased on the input conditions and are controlledby two non-overlapping clocks.A PLA is characterized by three parameters:

(1) input size, 2; (2) output size, (9; and (3) thenumber of states, S. The ISC for any controllerfunction of these parameters. By varying 2", (9, S,random PLAs are generated and characterized asfollows: The switch level model of the controller,extracted from the layout, is simulated usingrandom input vectors. Simulation is carried outuntil the capacitance switched per clock step (asopposed to per input pattern in the case of themodules in the library) converges in a fashionsimilar to the one described in module librarycharacterization.

s1.

TABLE ISC Data for Some Parameterized Library Modules (Bit Width > 1)

Module ISC Table (Bit Width-ISC(pF))

9.10.11.12.

13.14.15.16.

AdderSubtracterComparator >MultiplierMultiplexor

RegisterSignal Register(Register + Glue Logic)ANDORNOTNAND

NORXORXNORRAM

1-0.45, 2-0.98, 4-1.93, 5-2.43, 8-3.84, 16-7.742-0.97, 4-2.50, 6-3.26, 8-5.64, 10-7.05, 16-12.161-0.44, 2-0.88, 4-1.82, 5-2.00, 6-2.78, 8-3.99, 16-12.572-2.27, 3-3.53, 4-7.99, 5-15.30, 8-60.48, 16-455.392-inputs: 2-0.45, 4-0.86, 8-1.70, 12- 2.53, 16- 3.394-inputs: 2-1.41, 4-2.68, 8-5.20, 12-7.95, 16-10.796-inputs: 2-2.46, 4-4.69, 8-9.46, 12-14.53, 16-19.538-inputs: 2-3.29, 4-6.23, 8-13.10, 12-19.89, 16-26.731-3.77, 2-6.53, 4-12.09, 5-13.68, 6-18.19, 7-18.67, 16-41.622-10.90, 3-12.41, 4-15.45, 5-15.99, 8-23.78, 16-39.35

2-0.17, 3-0.29, 4-0.36, 5-0.45, 6-0.55, 8-0.76, 10-0.97, 16-1.552-0.18, 3-0.27, 4-0.38, 5-0.48, 6-0.51, 8-0.71, 10-0.98, 16-1.481-0.04, 2-0.08, 3-0.12, 4-0.16, 5-0.20, 8-0.33, 16-0.661-0.06, 2-0.13, 3-0.19, 4-0.26, 5-0.32, 6-0.38, 7-0.44, 8-0.53,

16-1.063-0.17, 4-0.22, 5-0.28, 6-0.35, 8-0.472-0.31, 3-0.50, 4-0.68, 5-0.86, 6-0.98, 8-1.35, 10-1.692-0.31, 3-0.50, 4-0.64, 5-0.80, 6-0.97, 8-1.26, 10-1.61, 16-2.56sel_size=2: 1-159.84, 2-187.12, 4-244.310, 8-392.75, 16-592.48sel_size 3: 1-217.57, 2-250.65, 4-318.67, 8-463.22, 16-736.95

Page 9: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

ARCHITECTURAL POWER ESTIMATION 263

A PLA characterization table is obtained, whichis used later for the estimation of switchedcapacitance in the controller. Table I! shows a

portion of the PLA characterization table. Figure5 shows a three dimensional plot of ISC values forcontrollers with varying (.9, $ and with input size,77-5.

6. ARCHITECTURAL POWERESTIMATION

’pla.data’

510 15

2tp: (O)30 35 40 4510

15 States(S}

PDSS (Fig. 6) accepts specifications in a behavior-al subset of VHDL and user-specified constraintsin terms of clock-period and area. It generates aRT level design satisfying the given constraints.PDSS consists of four main modules: schedulingand performance estimation, register optimization,interconnect optimization and controller genera-tion. More detailed discussion on PDSS appearedin [11, 33].The RT level design produced by PDSS consists

of four major subunits from the power estimationview point: Datapath, Controller, Interconnect

FIGURE 5 PLA Characterization with size of Inputs I 5.

SpecificationfVHDL) -(I) Scheduling Co::User

(2) Repair Optimization

(3) Optimization --(4)

binding

Profiling

Design (VHDL)

FIGURE 6 PDSS Environment.

TABLE II A portion of the PLA Characterization table

SI.No 2- 0 S ISC(pF

1. 5 15 5 9.502. 5 15 10 16.453. 5 15 15 18.854. 5 15 20 20.005. 5 15 25 24.786. 5 15 30 26.847. 5 20 5 11.398. 5 20 10 15.149. 5 20 15 21.4210. 5 20 20 28.6611. 5 20 25 31.3412. 5 20 30 33.8213. 10 25 10 23.3714. 10 25 15 25.3615. 10 25 20 35.9716. 10 25 25 43.9317. 10 25 30 43.9918. 10 30 10 27.4019. 10 30 15 29.3520. 10 30 20 41.4021. 10 30 25 47.9622. 10 30 30 48.61

and System Clock. Power consumed in the designis given by,

Pdesign Pdp -+- Pcon + Pinter + eclock

where Pdp, Pcon, Pinter, and Pclock are the powerconsumed in the datapath, controller, and inter-connect and system clock respectively.Our RT level power estimator needs the follow-

ing inputs as shown in Figure 6:

1. Profile Data: This is obtained from the beha-vioral profiling of the high level specificationgiven as input to the PDSS. For each operatorand each edge in the CDFG a count of totalevent activity occured on the operator/edge isrecorded.

Page 10: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

264 S. KATKOORI AND R. VEMURI

2. Binding Information: One of the synthesis tasksis to bind each operator and edge in the CDFGto an instance of one of the modules in themodule library. It also binds the temporaryvariables introduced to hardware registers inthe module library.

3. Module Library: The module library is pre-characterized for ISC as explained in thesection 5.

4. RT Level Output: This is the structural im-plementation containing instantiations of mod-ules from the module library. The controller is afinite state machine description. The datapathand controller interact with each other to formthe entire design.

To estimate average power of a given RT-Leveldesign, the power estimator goes through thefollowing phases: (1) Pre-processing stage; (2)Profile data computation of hardwar resourcesintroduced during synthesis; and (3) Power esti-mation of the design.

6.1. Pre-processing Stage

The power estimator initializes with the ISC valuesof all the modules obtained from the librarycharacterization. The binding information pro-vided by the synthesis tool is used to build a list ofinstances (inst_list) of modules. Each instance isinitialized with sum of the profile data of all theoperators (or edges) in the CDFG which arebound to that particular instance. Note that someof the instances’ profile data is not known as theyare introduced during synthesis. The profile dataof such instances is computed in the next phase.

6.2. Profile Data Computation

Algorithm Compute_profile() in Figure 9 is usedto compute the profile data of the temporaryregisters and the interconnect units introducedduring the synthesis.

Procedure Build_dependency_st() builds a de-pendency list of the instances. It goes through eachinstance inst in the instance list inst._list and if the

profile data of the instance is unknown, then itadds the instances at the inputs, to the insti’sdependency list.

Consider Figure 7 in which there is a feedbackfrom the output of the multiplexor inst(j) to theinput of the register inst(i). Such a configurationgives rise to dependency cycles. Procedure Relno-re_cycles () removes the above described depen-dency cycles in the following way. Let twoinstances and j be in a dependency cycle. Besidesthe input which gives rise to a dependency cycle, ifthe profile data on remaining inputs of an instanceis known, then let us say that the profile data ofthat instance is known. Otherwise, the profile dataof the instance is said to be unknown. Thefollowing three possibilities can occur:

Case 1" The profile data of both instances isknown. The profile data of each of the instancesis equal to the sum of the profile data of bothinstances.Case 2: The profile data on only one of theinstances (say i) is known. We remove the edgefrom j to i. Assuming that instance j is not in adependency cycle with any other instance, theprofile data ofj can be computed, which is the sumof the profile data of all the instances (including/)at its inputs. Since there was an edge from j to i,the instance has event activity from the output ofinstance j. Thus the new profile data of is the oldprofile data plus the computed profile data ofinstance j.Case 3: The profile data of both instances is notknown. Both the edges in the cycle are removed,

==>

inst(j) in

n

s

s(i(j))

dependencycycle

FIGURE 7 An example of dependency cycle.

Page 11: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

ARCHITECTURAL POWER ESTIMATION 265

profile data of and j are computed based on theprofile data of the instances at other inputs. Thenew profile data of each instance is the sum of theprofile data of both instances.

To illustrate Case 1, consider Figure 8. Twoinstances inst(i) and inst(j) are both in adependency cycle. The profile data on inputs Aand B of inst(i) are 10 and’ 20 respectively.Similarly, the profile data on inputs of inst(j)namely, C and D are 15 and 12 respectively. Wemake a conservative assumption that the inputs ofa multiplexor are not switching simultaneously.Thus, the profile data on the output of a multi-plexor is the sum of the profile data on all theinputs. Thus, the equations to compute profiledata on outputs of both instances are:

e(x) e( r’) + e(A) +

P( Y) P(X) + P(C) + P(D)

Where P(X) is the profile data on the output ofinst(/) and P(Y) is the profile data on the output ofinst(j). P(X) appears on the right hand side ofP(Y) equation and vice versa. The above set ofequations cannot be solved, unless we remove thedependency cycle. Since P(A), P(B), P(C) andP(D) are known, the example belongs to Case as

A B

inst(i) /X C D

Y

FIGURE 8 An example to illustrate profile data computationin presence of a dependency cycle.

discussed above. With the dependency cycleremoved, the profile data of X and Y are P’(X)P(A)+ P(B)= 30 and P’(Y) P(C)+ P(D)= 27.

With the dependency cycle included, the newprofile data for both X and Y are, P(X)= P(Y)P’(X)+ P’(Y)= 57. If P(A) or P(B) is unknown

to start with, then the example belongs to Case 2.if P(A) or P(B) and P(C) or P(D) is not knownthen the example belongs to Case 3.

After removing the cycles, for each instancewhose profile data is unknown, it is calculated asthe sum of profile data of all the instances at i’sinputs. After computing profile data for all theinstances, the data path power can be computed asfollows.

6.3. Power Estimation in Data Path

The data path consists of the execution units suchas adders and multipliers and storage units such aslatches and shift registers. The power consumed bythe datapath Pdp, and is computed by lines 2-4 ofthe procedure Estimate_Power (). Pdp is given by:

Pdp Eop * ISCopop

Where Eop is the event activity (or profile data) ofthe operator (or register) and ISCop is the averageswitched capacitance value of a hardware moduleinstance to which the operator node op is bound.

6.4. Power Consumed by System Clock

As the system clock controls all the clockedcomponents in the data path, it is loaded by a

Algorithm Compute_profile()begin1. T Build_dependency_list();2. Remove_cycles(T);3. for each I in inst_list do4. for each J in/.dependency_list do5. /.profile_data += J.profile_data6. end for7. end forend

FIGURE 9 Algorithm for the computation of profile data.

Page 12: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

266 S. KATKOORI AND R. VEMURI

large amount of capacitance. The power consumedby the system clock is estimated in AlgorithmEstimate_Power ( ) shown in Figure 10. The lines8-10 estimate the load capacitance Cclock on thesystem clock. In the clocked components such asregisters and latches, the load capacitance on theclock line varies approximately 50fF per bit-width.Thus, total capacitive load on the system clock isthe sum of the clock capacitances of each instance.The total capacitance switched in a design is givenby the product of number of input vectors (N0,total number of clock cycles required to process aninput vector (Ttota and clock capacitance (Cclock ).

6.5. Power Estimation in Controller

The controller is a finite state machine implemen-ted as a PLA. Any PLA is characterized by threeparameters: the number of inputs 2-, the number ofoutputs (9 and the number of states, S. In themodule library, there already exists a PLAcharacterization table, which was explained indetail in section 5. From the table, we can obtainthe average intrinsic switching capacitance ISC ofa PLA of a given size. Interpolation/Extrapolationis assumed where ever the values are not availablefor a given set of parameter values. The ISC valueso obtained is the average capacitance that switchesper clock step in the PLA of size (2-, O, S).

Algorithm Estimate_power0begin1. for each I in inst_list do2. if (Lmodule_type OPERATOR OR REGISTER) then3. Pdp+=/.profile_data ISC(l.op_type,/.size)4. else5. if (/.module_type INTERCONNECT) then6. tnter+= /.profile_data ISC(MULTIPLEXOR,/.size)7. endif8. endif9. if (/.module_type CLOCKED_COMP) then10. Caock +----- 50fF (Lsize)11. endif12. end for13.14. Let 27, O and q be the controller size.15. Ttotat=Esthnate_clock steps()16. C’con17. Pco Nv18. Pdo Nv T,o Cdo20. Ptotat Pa + Peon +/nter + Paocend

Let Nv be the total number of profiling stimuliapplied. Let the CDFG be scheduled in Nc numberof control steps. In the module library, for eachmodule, the number of clock steps needed toprocess an input vector is stored as function of itsparameters such as bit-width, wordsize etc. Thetotal number of clock steps required to process aninput vector is sum of the maximum number ofclock steps needed in each of the control step.Algorithm Estimate_clock_steps () estimates thenumber of clock steps needed by the design toprocess an input vector (say Ttotal). The powerconsumed in the controller is given by the productof Nv, Ttotal and the ISC(2,O,S) as given inAlgorithm shown in Figure 10.

6.6. Power Estimation in Interconnect

Already the profile data for the interconnect unitshas been calculated as discussed in profile datacomputation phase. The interconnect units are notpresent in CDFG and arise due to operator sharing,register sharing and interconnect sharing. In thiswork, we consider only Multiplexor-based designs.The profile data of a multiplexor is computed as

the sum of the event activities on all the inputs.This is a conservative estimate of the total numberof events that the multiplexor is subjected to.The power consumed in the interconnect is

calculated in the same way as is done for thedatapath.

Algorithrn Estimate_clock_steps()beg01.2.3.4.

5o6.7.8.9.end

for in 0 to N domax_clock_cnt +- 0for each op scheduled in control step do

if( raoduleop.cloek_steps > max_clock_cnt)where op is bound to moduleop then

max_clock_cnt +- moduleo.clock_stepsendif

end forTtotat += max_clock_cnt

end for

FIGURE 10 Algorithm for the estimation of power. FIGURE 11 Algorithm to estimate the number of clock steps.

Page 13: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

ARCHITECTURAL POWER ESTIMATION 267

It is given by:

einter Ei * ISC(MUX, /.size)

where is a instance of a multiplexor of size/.size.

and ISC (MUX, /.size) is the average intrinsicswitching capacitance of a multiplexor of size/.size.

7. RESULTS

In this section we present experimental results forsix designs"

1. Compression chip2. Decompression chip3. FIFO, a first-in first-out queue4. Find, sort and search chip5. Shuffle Exchange Network [35]6. Traffic light controller

Table III shows the behavioral specification data.PDSS system is implemented in C++ on SunSparcstation platforms.Each register level design produced by PDSS is

processed by the Lager IV silicon compiler [36] togenerate mask layouts. The designs generated use

a two phase non-overlapping clocking scheme.Although the designs are generated in a scalableCMOS technology, all results for this paper areobtained using 2 micron feature size. Switch levelmodels are extracted from the layouts andsimulated using the IRSIM-CAP [37] switch levelsimulator. Table IV shows the synthesized designdata at the layout level.

Table V shows the estimated and actual powersin the data path and interconnect of the sixdesigns. The estimated power is computed at theRT-level and actual power is determined by theswitch level simulation of the synthesized designs.As shown in the table, the percentage error inestimation for data path is in the range of 2.51%12.58% with the average deviation from the actualvalue being 6.25%.

Table VI shows the comparison of powers forcontroller. The estimation error is in the range3.53% 15.22% with the average deviation being10.51%. Table VII shows the comparison of thepower dissipated due to the system clock. Theestimation error is in the range 18.59% 30.69%with the average deviation of 22.32%. Table VIIIshows the power values for the entire design,which is the sum of the power, in datapath (Pdp),interconnect (einter), system clock (Pclock), and

TABLE III Behavioral Specification Data for Six Designs

S1. Design LOC DFG DFG Profiling ProfilingNodes Edges Stimuli Time (s)

Compress 42 22 107 25Decompress 40 22 106 25FIFO 70 38 176 25Find 63 33 121 16Shuffle Xchg NW 450 31 2040 14TLC 72 27 123 10

9.486.0016.339.81

30.901.37

TABLE IV Synthesized Design Data at the Layout Level

S1. Design Clock Nodes Transistors AreaPeriod (sq. mm.)

Cycles SimulationTime (min)

Compress 200ns 2,946 6,315 10.9 1450 5.34Decompress 200ns 2,803 6,059 10.3 825 3.23FIFO 900ns 4,438 10,688 24.6 2,364 20.44Find 550ns 5,602 11,458 20.3 5,360 35.00Shuffle Xchg 160ns 49,655 95,004 418.7 1,975 240.00TLC 200ns 1,938 4,769 6.9 420 1.28

Page 14: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

268 S. KATKOORI AND R. VEMURI

TABLE V Comparison of the power in the Data path and Interconnect

S1.No Design Total (edp + einter)

Estimated Actual %Devn.pF pF

Compress 24219 21171 12.58Decompress 15034 14473 3.73FIFO 56863 51614 9.23Find 266602 281415 5.55Shuffle 525207 545976 3.95TLC 4415 4526 2.51

Average Error 6.25

TABLE VI Comparison of the power in the Controller

S1.No Design Total (Peon)

Estimated Actual %Devn.pF pF

Compress 45974 39209 14.71Decompress 26309 22303 15.22FIFO 338266 304340 10.03Find 351066 330718 5.79Shuffle 297400 256303 13.81TLC 13906 13414 3.53

Average Error 10.51

TABLE VII Comparison of Clock power

S1.No Design Total (Pclock)

Estimated Actual %Devn.pF pF

1. Compress 10793 13036 20.782. Decompress 6249 8167 30.693. FIFO 27580 33958 23.124. Find 59787 47241 20.985. Shuffle 554768 444923 19.806. TLC 2786 2268 18.59

Average Error 22.32

TABLE VIII Comparison of the total power for the Entire Design

S1.No Design Total (Pdesign)

Estimated Actual %Devn.pF pF

Compress 80986 73416 9.35Decompress 47592 44943 5.57FIFO 422709 389912 7.75Find 677455 659374 2.66Shuffle 1377375 1247202 9.45TLC 21107 20208 4.26

Average Error 6.51

Page 15: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

ARCHITECTURAL POWER ESTIMATION 269

controller (Pcon). The percentage error is in therange 4.26%- 9.35% and the average deviation is6.51%. This shows reasonable correlation betweenthe estimated and actual values not only in theentire design but also in the datapath andcontroller seperately.

Electronics Directorate of the Wright Laboratoryof the US Air Force under contract numberF33615-9 l-C- 1811 and by the Advanced ResearchProjects Agency under order no. 7056 monitoredby the Federal Bureau of Investigation undercontract no. J-FBI-89-094.

8. DISCUSSION AND CONCLUSIONS

The following are some of the factors which havenot been taken into account during the powerestimation:

1. Effect of Placement and Routing2. The random characterization of the RTL

module library and PLAs gives rise to aninherent estimation error. This can be remediedby taking the activity on the inputs into theestimation procedure.

3. Glitch power consumption.4. PLA characterization based on only inputs,

outputs and states is not sufficient. The statetable information has to be taken into account.

5. In the estimation of power in multiplexors, weassumed that the activity on the inputs is addedup to get the activity of the multiplexor. We aremaking a very conservative assumption that allthe inputs are not switching simultaneously.This is another source of error.

In this work, we presented an accurate powerestimation technique based on the profile dataobtained at the behavior level. The estimationtechnique is implemented in the framework of ahigh level synthesis system. Compared to theestimation techniques at the lower levels ofabstraction, the technique is faster in the executiontime. For the six examples considered, the averageestimation error at the design level is within 10%,which demonstrates that the estimation techniqueis reliable.

Acknowledgements

This work is done at the University of Cincinnatiand is supported in part by the Solid State

References[1] Camposano, R. and Wayne Wolf. (1991). "High Level

VLSI Synthesis", Kluwer Academic Publishers.[2] Lemnois, Z. J. and Gabriel, K. J. (1994). "Low-Power

Electronics", IEEE Design and Test of Computers, pp. 8-13, Winter.

[3] Najm, F., "Towards a high-level power estimationcapability", In Proceedings of the 1995 InternationalSymposium on Low Power Design, April 1995.

[4] Powell, Scott R. and Chau, Paul M. (1990). "EstimatingPower Dissipation of VLSI Signal Processing Chips: ThePFA Technique," VLSI Signal Processing IV, pp. 250-259.

[5] Powell, Scott R. and Chau, Paul M., "A Model forEstimating Power Dissipation in a Class of DSP VLSIChips", IEEE Transactions on Circuits and Systems,38(6), June 1991.

[6] Chandrakasan, Anantha P. et al., "HYPER-LP: ASystem for Power Minimization Using ArchitecturalTransformations", Proceedings of ICCAD, pp. 300-303, November 1992.

[7] Chandrakasan, Anantha P. et al., "Optimizing PowerUsing Transformations", IEEE Transactions on Compu-ter Aided Design, pp. 12-31, January 1995.

[8] Landman, P., "Low-Power Architectural Design Meth-odologies", Ph.d. Thesis, Memorandum No. UCB/ERLM94/62, 30th August 1994.

[9] Landman, P. and Rabaey, J., "Black-Box CapacitanceModels for Architectural Power Analysis", Proceedings ofthe 1994 International Workshop on Low Power Design,Napa Valley, CA, pp 165-170, April 1994.

[10] Rabaey, J. M., Chu, C., Hoang, P. and Potkonjak, M.,"Fast Prototyping of Datapath-Intensive Architectures,"IEEE Design and Test of Computers, pp. 40-51, June1991.

[11] Nand Kumar, Srinivas Katkoori, Leo Rader and RangaVemuri (1995). "Profile-Driven Behavioral Synthesis forLow Power VLSI Systems", IEEE Design and Test ofComputers, pp. 70-84, Fall.

[12] Renu Mehra and Rabaey, Jan, M., "Behavioral LevelPower Estimation and Exploration", Proceedings of theInternational Workshop on Low Power Design, pp. 165-170, April 1994.

[13] Paul Landman and Jan Rabaey, "Power Estimation forHigh Level Synthesis", Proceedings of EDAC-EUROA-SIC, pp 361 366, February 1993.

[14] Anand Raghunathan and Jha, Niraj K. (1994), "Beha-vioral Synthesis for Low Power", Proceedings of ICCD.

[15] Anand Raghunathan and Jha, Niraj K. (1995). An ILPformulation for low power based on minimizing switchedcapacitance during data path allocation", in the Proceed-ings ofIEEE Symposium on Circuits and Systems.

Page 16: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

270 S. KATKOORI AND R. VEMURI

[16] Chandrakasan, Anantha P., Sheng, S. and Brodersen, R.,"Low Power CMOS digital design", IEEE Transactionsof Solid State Circuits, April 1992.

[17] Najm, F. N., "A survey of power Estimation Techniquein VLSI circuits (Invited Paper)" IEEE TransactionsVLSI Systems, 2(4), 446-455, January 1995.

[18] Srinivas Katkoori, Nand Kumar and Ranga Vemuri,"High Level Profiling Based low Power SynthesisTechnique", In the Proceeedings ofICCD, October 1995.

[19] Weste, N. and Eshraghian, K. (1985). "Principles ofCMOS VLSI Design: A ,Systems Perspective", Addison-Wesley.

[20] Veendrick, H. J. M., "Short-circuit dissipation of staticCMOS circuitry and its impact on the design of buffercircuits," IEEE Journal on Solid State Circuits, SC-19, pp.468-473, August 1984.

[21] Ravi Kalyanaraman, "Behavioral Test Generation forVHDL Programs", MS Thesis, Department of Electricaland Computer Engineering, University of Cincinnati,September 1993.

[22] Darrel Ince (1991). "Software Testing", in John McDer-mid (ed.) Software Engineer’s Reference Book, Butter-worth-Heinemann Ltd.

[23] John Hennessy and David Patterson (1990). "ComputerArchitecture: A Quantitative Approach", Morgan Kauf-mann Publishers.

[24] Stallings, W. (ed.) (1990). "Reduced Instruction SetComputers (RISC)", IEEE Press.

[25] Cmelik, R. F., Kong, S. I., Ditzel, D. R. and Kelly, E. J.,"An Analysis of MIPS and SPARC instruction setutilization on the SPEC benchmarks", In ASPLOS-IVProceedings, SIGARCH Computer Architecture News 19,pp. 290-302, 2 April 1991.

[26] Graham, S. L., Kessler, P. B. and McKusick, M. K., "Anexecution profiler for modular programs", SoftwarePractice Exper. 13, pp. 671-685.

[27] Morris, W. G., "CCG: A prototype coagulating codegenerator", Proceedings’ of the SIGPLAN 91 Conferenceon Programming Language Design and Implementation,SIGPLAN Nat.(ACM) pp. 45-58, 26, June 1991.

[28] Pettis, K. and Hanson, R. C., "Profile guided codepositioning", Proceedings of the SIGPLAN 91 Conferenceon Programming Language Design and Implementation,SIGPLAN Nat. (ACM) pp. 16-27, June 1990.

[29] Sarkar, V., "Determining average program executiontimes and their variance", In Proceedings of the ACMSIGPLAN 89 Conference on Programming LanguageDesign and Implementation, SIGPLAN Nat. (ACM),289- 312, 24 June 1989.

[30] Ball, T. and Larus, J. R., "Optimally Profiling andTracing Programs", ACM Transactions on ProgrammingLanguages and Systems, 16(4), 1319-!350, July 1994.

[31] Goldberg, A., "Reducing overhead in counter- basedexecution profiling", Tech Rep. CSL-TR-91-495, Com-puter Systems Lab., Stanford Univ., Standford, Calif,Oct. 1991.

[32] Samples, A. D., "Profile Driven Compilation", Ph.Dthesis (Rep. UCB/CSD 91/627), Computer Science Dept.,Univ. of California, Berkeley, Apr. 1991.

[33] Jayanta Roy and Ranga Vemuri (1992). "DSS :ADistributed Synthesis System", IEEE Design and Test ofComputers.

[34] John Ousterhout (1987). "Using Crystal for TimingAnalysis", Electrical Engineering and Computer Sciences,University of California at Berkeley.

[35] "Novel IC Shuffles Parallel Processing Data", ElectronicProducts, pp. 42--50, August 1, 1986.

[36] Rajeev Jain et al., "An Integrated CAD System forAlgorithm-Specific IC Design", IEEE Transactions onComputer Aided design, 10(4), April 1991.

[37] Landman, P., "IRSIM-CAP Modified version of IRSIMfor better Capacitance Measurements, Univ. of Calif.Berkerley.

[38] Salz, A. and Horowitz, M., "IRSIM: an incrementalMOS switch-level simulator," in Proc Design AutomationConf., pp. 173--178, June 1989.

Authors’ Biographies

Srinivas Katkoori is an assistant professor incomputer science and engineering at the Universityof South Florida, Tampa. In 1997, he received hisdoctoral degree in computer engineering from theUniversity of Cincinnati. In 1992, he received, hisbachelor’s degree in electronics and communica-tion engineering from the Osmania University,India. His research interests are in high-levelsynthesis and low-power synthesis of VLSI sys-tems. Katkoori is a member of IEEE and ACMSIGDA.Ranga Vemuri, an associate professor of elec-

trical and computer engineering at the Universityof Cincinnati, also directs its Laboratory forDigital Design Environments. His interests includethe computer-aided design of digital systems,formal verification, system synthesis, performancemodeling, hardware description languages, andparallel algorithms. Vemuri received the M.Tech.,degree from the Indian Institute of Technology,Kharagpur, and the Ph.D., from Case WesternReserve University, both in computer engineering.He received the Siddhartha gold medal, a distin-guished research award, and an outstandingteacher award. He is a member of the IEEEComputer Society, IEEE Circuits and SystemsSociety, ACM SIGDA, American Society ofElectronic Engineers, and Eta Kappa Nu.

Page 17: Architectural Power Estimation Based Behavior ProfilingArchitectural PowerEstimation Based onBehaviorLevel Profiling ... then a low power design from amongthese alternatives canbe

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2010

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

DistributedSensor Networks

International Journal of