timing analysiswithcompactvariation awarestandardcellmodels

9

Click here to load reader

Upload: annome

Post on 21-Apr-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Timing Analysiswithcompactvariation Awarestandardcellmodels

ARTICLE IN PRESS

INTEGRATION, the VLSI journal 42 (2009) 312–320

Contents lists available at ScienceDirect

INTEGRATION, the VLSI journal

0167-92

doi:10.1

� Corr

E-m

journal homepage: www.elsevier.com/locate/vlsi

Timing analysis with compact variation-aware standard cell models

Seyed-Abdollah Aftabjahani, Linda Milor �

Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

a r t i c l e i n f o

Article history:

Received 29 January 2008

Received in revised form

3 November 2008

Accepted 11 November 2008

Keywords:

Standard cell characterization

Process variation

Within-die variation

Timing analysis

60/$ - see front matter & 2008 Elsevier B.V. A

016/j.vlsi.2008.11.008

esponding author. Tel.: +1404 894 4793; fax:

ail address: [email protected] (L. Mi

a b s t r a c t

A compact variation-aware timing model for a standard cell in a cell library is developed. The cell model

incorporates variations in the input waveform and loading, process parameters, and the environment

into the cell timing model. The cell model operates on full waveforms, which are modeled using

principal component analysis (PCA). PCA enables the construction of a compact model of a set of

waveforms impacted by variations in loading, process parameters, and the environment. Cell

characterization involves describing with equations how waveforms are transformed by a cell as a

function of the input waveforms, process parameters, and the environment. The models have been

evaluated by calculating the delay of paths. The results demonstrate improved accuracy in comparison

with table-based static timing analysis at comparable computational cost. Complexity of the models as

a function of the number of parameters modeling variation is also discussed, and shows reduced

memory requirements as the number of parameters describing variations increases.

& 2008 Elsevier B.V. All rights reserved.

1. Introduction

Circuit timing analysis is needed to ascertain if a design meetstiming requirements before manufacturing. The standard ap-proach to estimate circuit timing is through static timing analysis(STA). STA involves converting a circuit into a timing graph, whereeach edge represents the delay of a gate between its inputs andoutputs. STA then performs a graph traversal to find the longestpath, based on a project planning technique, called the CriticalPath Method [1].

The delay through gates is a function of the slope of the inputsignals. Hence, the traditional approach to accounting for theinput slope is to characterize cell delay through tables, which pre-compute delay and output slew as a function of input slew foreach gate in a standard cell library. In order to account for slew,STA requires an additional step, a preliminary backwards traversalthrough the timing graph to determine the relationship betweenslew and delay to the output for each node in the network [2].

Circuit timing is increasingly impacted by variation due themanufacturing process and the operating environment. Thestandard approach to account for variation is through worst-caseanalysis [3]. Worst-case analysis assumes that parameters areconstant within a chip, but vary between chips. Designers ensurethat their design satisfies specifications for all process corners bysimulating the circuit with a small set of ‘‘corner’’ models thatrepresent process extremes. The ‘‘corner’’ models consist of tables

ll rights reserved.

+1404 894 4641.

lor).

relating delay and output slew to input slew and loading for theseprocess extremes.

Circuit timing has, however, become increasingly susceptibleto within-die variation due to both the manufacturing process andthe operating environment. Hence, it has become imperative totake into account these variations in device and interconnectcharacteristics during design. Worst-case design does not takeinto account within-die variation.

To account for within-die variation, we need to performstatistical static timing analysis (SSTA) at corners that definedie-to-die variation [4–12]. SSTA can determine the variation incritical path delays as a function of random and systematicvariation within and between paths. SSTA resembles STA, exceptgates are characterized by delay distributions. The gate delay andarrival time distributions result in distributions of output delays,and correlations among these delays. Graph traversal involvesapplying statistical sum to arrival time distributions and the delaydistribution for each gate, and statistical maximum operations tothe resulting gate delay distributions.

Clearly, for SSTA we need compact models of standard cellsthat are accurate over parameter and environmental variations,not just at process extremes, as in worst-case design. Ourproposed models can be used to generate the delay distributionfunctions, which can account for spatial correlations, as needed,using methods as in [6–12]. Our models can also be used directlyin Monte-Carlo-based SSTA, which involves path enumeration,Monte Carlo analysis of critical paths, and the statisticalmaximum operation on the resulting path delays, as describedin [8,13–17].

The goal of this work is to develop a methodology to constructcompact variation-aware timing models for standard cells in a cell

Page 2: Timing Analysiswithcompactvariation Awarestandardcellmodels

ARTICLE IN PRESS

S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312–320 313

library, which are accurate over process and environmentalvariations. The model also utilizes compact models of waveforms.This paper will show that these compact waveform models, whenused for static timing analysis, are more accurate than the well-known tabular method [18] and comparable in terms ofcomputational cost.

The compact waveform models are constructed via PCA [19] ofwaveforms, where the waveforms are described by principalcomponent scores (PCSs), which can reconstruct the waveforms.Moreover, since the principal component basis functions areshared among all waveforms, cell library characterization requiresthat we only store the equations that describe the transformationsof the principal component scores as the waveform passesthrough the cell. The equations also describe changes in cellperformance as a function of variations in the process andoperating environment.

This method differs from traditional static timing analysis

(a)

by working with waveforms with realistic shapes, (b) by storing the waveform transformation through a cell as an

equation rather than a table, and

(c) by including equations that describe any changes in cell

performance as a function of variations in the process and

operating environment.

1 Design rule check.2 Layout versus schematic.

This is not the first attempt to accurately model waveforms fortiming analysis. Recent work has considered accurate modeling ofwaveform propagation through standard cells. In [20], it is shownthat realistic waveforms do not resemble the idealized ramp, andin [21] it is shown that realistic waveform modeling results inmore accurate timing analysis. Examples of waveform modelinginclude [22], where a Weibull shape parameter is added towaveform characterization to account for the differences betweenreal waveforms and their approximation by a ramp. Other workhas aimed to model realistic waveforms with a set of basisfunctions [23–26]. The basis functions have been selected in avariety of ways, including an error minimization heuristic,involving shifting and scaling of waveforms [23,24], PCA [25],and singular value decomposition (SVD) [26]. All prior work hasshown that a few basis functions can be used to approximaterealistic waveforms.

Like [24,27], the proposed work considers the impact ofprocess and environmental variations on waveforms. In theproposed work, the basis functions are derived by PCA. Hence,the proposed approach extends prior work in [25,26] by includingin PCA waveform model construction for large variations fromparameters related to the process and the environment. This workformalizes, generalizes, and specifies restrictions for the approach,and proposes methods to make the waveform models practical.

The cell models differ from prior work on modeling cells asequations [10,11,28–30] since the cell models operate on para-meters that describe waveforms, not just process parameters,waveform slew, and environmental parameters. The parametersare not required to be independent, and the compact modelconsists of multivariate polynomials with a minimum number ofterms, which are selected based on analysis of variance andaccuracy.

Since cells operate on waveforms in the PCA domain, severalnew problems arise. First, we need to determine the set of PCSsthat correspond to realistic waveforms, i.e. PCSs that can betransformed back to the time domain. Second, we need commonprincipal component basis functions for both the inputs andoutputs of cells. This is because PCA is a data-driven methodology.Hence each set of input waveforms and each standard cell cangenerate a unique set of principal component basis functions

describing the output waveforms. Hence, some additional stepsare needed to come up with a common set of basis functions forall inputs and cells.

Additionally, for our model involving PCA waveform modelingand cell characterization with equations, we show that unlike thetabular static timing analysis method, where memory usageincreases exponentially as a function of accuracy in the dis-cretization of parameters that characterize the input and outputwaveforms (slope and fanout), our proposed method is typicallyquadratic in memory usage as a function of the parametersdescribing the waveforms, process, and environmental variations.Finally, we apply the PCA model to static timing analysis andexamine the accuracy of delay calculations for long chains ofgates.

This paper is organized as follows. Section 2 describes theexperimental platform and the parameters modeling variabilityfor cells and waveforms. Sections 3 and 4 discuss waveform modelconstruction and accuracy analysis, respectively. Section 5describes cell model construction and evaluates accuracy ofdelays of paths in comparison with Hspice [31] and tabularstatic timing analysis. Memory usage and computational com-plexity are summarized in Section 6, followed by a conclusion inSection 7.

2. The experimental platform and model of variation

Traditionally, input waveforms are represented by delay–slopepairs. In this work, the slope is replaced by a set of PCSs. Thenumber of PCSs determines the accuracy of the model. In oneextreme, if all the scores are used, the model can reconstruct theexact waveform.

An inverter, designed and layed out with TSMC 180 nmtechnology, was used to develop the methodology. This technol-ogy was the most advanced one available for our CAD tools. AfterDRC1 and LVS,2 parasitics were included in the model throughparasitic extraction [32]. Advanced features of Hspice automatedthe large number of simulation runs, which included generatinginput waveforms based on a model and capturing the data pointsof the output waveforms at predetermined relative voltageintervals. The dataset was imported and manipulated usingMatlab [33] to construct the two-level full factorial model [34]for each output parameter. The significant effects were deter-mined to form the compact models.

Timing characteristics of standard cells are primarily a functionof loading capacitance (fanout), the input waveform, variations ofdevice parameters, i.e., the channel lengths and the thresholdvoltages of transistors, and the environment, i.e., the power supplyvoltage and temperature. The ranges of parameters in the modelare listed in Table 1. These parameters include the fanout,parameters that describe the input waveform (either slope orprincipal components, [PC1,PC2] or [L,Y], described in Section 3),the gate length and threshold voltage of the NMOS and PMOStransistors, temperature, and supply voltage.

The ranges for process parameters were chosen to be smallrelative to realistic die-to-die process parameter variations, whichare on the order of 730%. This is because die-to-die variation iseffectively handled with corner models, and the focus of this workis to supplement these models with variation-aware compactmodels at each corner that can account for within-die variation,whose range is smaller than die-to-die variation.

Page 3: Timing Analysiswithcompactvariation Awarestandardcellmodels

ARTICLE IN PRESS

Table 1Model parameters.

Variable Variation Variable Variation

Lp 0–5% Ln 0–5%

DVtp �5% to 5% DVtn �5% to 5%

DT 0–70 1C Slope [PC1, PC2] [L,Y] 0.4–4 ns (for slope) Dataset range (otherwise)

Fanout 1–64

Vdd �10% to 10%

S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312–320314

A set of models describes the stage delay and output waveformshape, characterized by its principal components ([PC1,PC2] or[L,Y]), as a function of all parameters in Table 1. The models weredesigned to be valid over a wide range of variation by using a fullfactorial experimental design covering all extreme corners of theexperimental space.

Fig. 1. The dataset of time domain rising and falling waveforms generated using a

full factorial experimental design.

Fig. 2. The waveforms corresponding to rising and falling transitions transformed

to the PCA domain.

3. Construction of the waveform model

In order to develop the waveform models, a dataset of 256falling and 256 rising waveforms was generated by running a two-level full factorial experiment varying the parameters in Table 1,i.e. characterizing fanout, parameters that describe the inputwaveform (either slope—during the first iteration—or principalcomponents—for other iterations, [PC1,PC2] or [L,Y]), the gatelength and threshold voltage of the NMOS and PMOS transistors,temperature, and supply voltage. The datasets for rising andfalling waveforms were merged by converting the fall times to risetimes by subtracting fall times’ voltages from the maximumvoltage. A set of waveforms is shown in Fig. 1.

The resulting 512 timing waveforms were discretized, bypartitioning the voltage scale into equal intervals to form 19voltage and time point pairs. An analysis of the impact ofdiscretization on accuracy is summarized in the Appendix.

Analysis of the inverter waveforms revealed that two PCSscover 99.8% of variation for both rising and falling transitions.Hence, only two PCSs (PC1 and PC2) serve as weights for the twowaveforms, whose linear combination is used to reconstructthe time domain transition waveform. Moreover, each transitionmaps to a single point in the two-dimensional PCA domain.The points in the PCA domain that correspond to the waveforms inFig. 1 are shown in Fig. 2. It can be seen that the groups ofwaveforms in the time domain map to clusters of points in thePCA domain.

Mapping between the time domain to the PCA domain and viceversa can be represented by a pair of transformations. If the dataare not standardized, the transformation equations are thefollowing:

PCS ¼ PCMðT � UÞ (1)

T ¼ U þ PCMI� PCS (2)

where PCS is a 19 element vector of scores, T is a 19 elementvector of time points describing the waveform, U is a 19 elementvector that is the average of all T’s in the dataset, PCM is the PCAmodel transformation matrix from the time domain to the PCAdomain, and PCMI is the inverse of PCM. For a 19 element vector,PCM and PCMI are 19�19-dimensional matrices. PCM is found bycomputing the eigenvectors of the 19�19 covariance matrix fromthe dataset. The rows of PCM are the normalized eigenvectors ofthis covariance matrix.

Based on (1), for a 19 element vector, there are 19 mappingfunctions (3); each maps the 19 time points describing a

waveform to a point in the 19-dimensional PCA space.

pc1 ¼ pcmð1;1Þðt1� u1Þ þ pcmð1;2Þðt2� u2Þ þ � � �

pc2 ¼ pcmð2;1Þðt1� u1Þ þ pcmð2;2Þðt2� u2Þ þ � � �

. . .

pc19 ¼ pcmð19;1Þðt1� u1Þ þ pcmð19;2Þðt2� u2Þ þ � � � (3)

The elements of the PCM matrix are coefficients of the linearequations for the transformation.

If the data are standardized, Eqs. (1) and (2) are replaced byEqs. (4) and (5):

PCS ¼ PCMðT � UÞD�1 (4)

T ¼ U þ PCMI� PCS� D (5)

where D is a diagonal matrix of standard deviations associatedwith each of the 19 elements of the dataset.

The significant PCSs are found through determining theeigenvalues of the covariance matrix. Small eigenvalues corre-spond to insignificant PCSs. Dimensional reduction is achieved bysetting coefficients of PCM that correspond to the eigenvectorsassociated with insignificant PCSs to zero. It is worth mentioningthat the sum of the eigenvalues corresponding to the eigenvectorsselected for the model determine the variance coverage.

Page 4: Timing Analysiswithcompactvariation Awarestandardcellmodels

ARTICLE IN PRESS

Fig. 4. Data points corresponding to the waveforms in Fig. 1 and the acceptability

region.

S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312–320 315

The inverse of the PCM matrix, PCMI, is used to reconstructwaveforms. PCMI is the transpose of PCM. The significant PCSsweight the waveforms stored in PCMI to generate time domaintransition waveforms, as follows, if the data are not standardized.

t1 ¼ u1þ pcmið1;1Þpc1þ pcmið1;2Þpc2þ � � �

t2 ¼ u2þ pcmið2;1Þpc1þ pcmið2;2Þpc2þ � � �

. . .

t19 ¼ u19þ pcmið19;1Þpc1þ pcmið19;2Þpc2þ � � � (6)

All of the points in the PCA domain do not necessarily map tovalid transition waveforms. Valid transitions require that thewaveform does not move backwards in time. Accordingly, it isrequired that

t194t184 � � �4t1 (7)

This creates an acceptability region restriction on the PCA spacewhich is obtained by substituting Eq. (2) or (5) into (7) to create18 linear relationships, as follows for the case with non-standardized data.

u1þ pcmið1;1Þpc1þ pcmið1;2Þpc2þ � � �

ou2þ pcmið2;1Þpc1þ pcmið2;2Þpc2þ � � �

u2þ pcmið2;1Þpc1þ pcmið2;2Þpc2þ � � �

ou3þ pcmið3;1Þpc1þ pcmið3;2Þpc2þ � � �

. . . (8)

The acceptability region is also restricted by the maximum andminimum of the PCSs from the dataset. Linear programming isused to find the acceptability region. The resulting acceptabilityregion is shown in Fig. 3(a).

Fig. 3(a) also contains some points, marked by A–D in the PCAdomain. They correspond to the waveforms in the time domain inFig. 3(b). Waveforms A and B in Fig. 3(b) are not valid waveformsbecause they contain segments where time moves backwards.They correspond to points A and B in Fig. 3(a), which are outsideof the acceptability region. Waveforms C and D in Fig. 3(b) aremonotonic and valid. They are inside the acceptability regionillustrated in Fig. 3(a).

Fig. 3. (a) The acceptability region in the PCA domain, together with some points,

labeled as A–D corresponding to corners of the PCA domain. (b) Time domain

waveforms corresponding to the corner points A–D in (a).

Some of the original data points in Fig. 2 lie outside of theacceptability region, as can be seen in Fig. 4. This is because ofdimensional reduction. These waveforms can be reconstructed byaugmenting the original dataset with waveforms containingnegative time points by reflecting the transitions across thevoltage axis. The addition of these waveforms with negative timepoints for model construction widens the acceptability region. Itdoes not invalidate the model because the PCA model uses onlythe positive time points.3 Additionally, the PCA model generatedfrom the resulting dataset has the property that U ¼ 0. As a result,the line segments bounding the acceptability region determinedby Eq. (7) always pass through the origin.

Initial analysis modeled the input waveforms with a slope.However, it is desirable to determine a set of universal PCA basisfunctions for both input and output transitions to avoid extramapping steps. To do this the corners of the PCA space that definethe extreme waveforms must be determined for two-levelfactorial analysis. But, as can be seen from Fig. 3, two of the PCSsthat correspond to corners of the PCA space lie outside of theacceptability region and correspond to invalid waveforms.

This problem was tackled by using a polar coordinate system,instead of a Cartesian coordinate system, for defining the cornersof the PCA space for full factorial experimental design.4 In order tomap PC1 and PC2 to polar coordinates, one finds the magnitude(L) and angle (Y) of a vector from the origin, as follows:

L ¼ ðPC1� PC1þ PC2� PC2Þ0:5

y ¼ arctanðPC2=PC1Þ (9)

The acceptability region in the polar domain is then determinedto guarantee valid waveforms, and a rectangle of maximum size isfit into the acceptability region to define the corners for fullfactorial experimental design, denoted as L(min), L(max), y(min),and y(max).

A common set of principal components for the input andoutput waveforms of a cell are generated by running the followingiterations:

(a)

3

frequ4

two

the s

find the principal components of the output waveforms;

(b) determine the acceptability region in the PCA space in terms

of polar coordinates;

(c) fit a rectangle into the acceptability region to find the corners

for full factorial experimental design;

(d) generate the waveforms corresponding to these corners and

apply these waveforms as inputs to the cell;

This process is similar to what is done in Fourier analysis, where negative

encies are used to help construct a model.

This coordinate conversion assumed only two significant PCSs. If more than

PCSs are significant, pairs of PCSs can be converted to polar coordinates, with

ame transformation.

Page 5: Timing Analysiswithcompactvariation Awarestandardcellmodels

ARTICLE IN PRESS

Fig. 5. The final acceptability region, including the limit imposed the convergence

requirement.

Comparing PC1s (Trise)

0

0.1

0.2

0.3

0.4

0.5

0

PC1s

GNIT(1)IT(2)

5 10 15 20

S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312–320316

(e)

Data Points

5

crea

spee6

Vari

Corr

Indic

simulate the cell to determine the corresponding outputwaveforms;

(f)

find the principal components of the output waveforms; and Comparing PC2s (Trise) (g) go to (a).

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0Data Points

PC2s

GNIT(1)IT(2)

5 10 15 20

Fig. 6. The coefficients of the principal component basis functions, computed after

each of the iterations: (a) PC1 and (b) PC2.

If the input waveform principal components match the outputwaveform principal components, then the principal componentshave converged to a set of waveforms appropriate for both theinput and output of the cell.

Convergence is only possible by restricting the time windowfor valid PCs, because a slow-rising input will create an outputwith a slower transition. In our example, we have restricted thetime window to be from 0.4 to 8 ns.5 This time restriction imposesan additional limit on the acceptability region, illustrated by thediagonal line in Fig. 5. With this limit, convergence was achievedin two iterations.

The resulting principal components are shown in Fig. 6.Principal component basis functions related to the original 512waveforms are labeled as GN (0th iteration), where the input wasa ramp. The following iterations are designated as IT(i) where ‘‘i’’is the iteration number. For these iterations, the input had arealistic shape and was defined by the extremes of the PCA spacein Fig. 5. Principal component basis functions from the twoiterations using realistic input waveform shapes were almostindistinguishable, and hence the model has converged.

Table 2Residuals of PCA models.

SNM SSM ASM

Max. (19-pt) 1.10 1.75 1.67

Average (19-pt) 0.08 0.07 0.07

Max. Ave. (19-pt) 0.25 0.38 0.36

Max. Ave. (15-pt) 0.00033 0.00023 0.00181

4. Comparison of PCA methods for waveform modeling

PCA waveform models can be constructed in a variety of ways,including (a) the symmetric non-standardized model (SNM),obtained from a dataset formed by augmenting the originaldataset with waveforms with negative time points, (b) thesymmetric standardized model (SSM), obtained like the SNMmethod, but with a standardized dataset (Eqs. (4) and (5)), and (c)the asymmetric standardized model (ASM), obtained with thestandardized dataset, but without augmenting the dataset withwaveforms containing negative time points. Note that theasymmetric non-standardized model was not considered becausea large number of the original data points are outside of theacceptability region.

Several criteria6 have been suggested to select the appropriatenumber of PCs for a model [19]. These criteria recommend very

This window size impacts the acceptability region. A larger window size

tes a larger acceptability region but reduces model accuracy and reduces the

d of convergence.

They include the following methods: the Broken Stick, the Average Root,

ability Explained by PCs, the Scree Plot, the Residual Trace, the Velicer Partial

elation Function, the Index of Correlation Matrix, Imbedded Error, and the

ator Function.

different numbers of principal component basis functions, rangingfrom one to 17. In order to keep our models compact, we haveselected two principal component basis functions. Two principalcomponent basis functions cover 99.8% of variation for both risingand falling transitions for all models.

The accuracy of the standard cell model is dependent on theaccuracy of

(a)

the mapping of a waveform from the time domain to the PCAdomain,

(b)

the mapping of input PCSs to output PCSs through a cell, and (c) the mapping of output PCSs back to the time domain.

We analyzed the PCA modeling accuracy by determining theresiduals at each voltage level for all 512 transition waveformsused to construct the model. Residuals are expressed as time

Page 6: Timing Analysiswithcompactvariation Awarestandardcellmodels

ARTICLE IN PRESS

Table 3Fraction of outliers (%).

Significant level SNM SSM ASM

0.01 0.05 0.01 0.05 0.01 0.05

T2-Statistic 0 0 0 0 0 0

Q-Statistic 4 8 4 10 5 8

Both 4 8 4 10 5 8

Vin

t

2 3 211

t

Vout

Fig. 7. Narrow tree of inverters used to evaluate the accuracy of the PCA method.

Comparison of Total Delay (fast risingtransition, fanout = 2)

0

0.5

1

1.5

2

2.5

1Stage Node

Tota

l Del

ay (n

s)

Hspice

Tabular

PC

Comparison of Total Delay (slow rising

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312–320 317

domain errors for a fixed voltage level. Table 2 summarizes theresults for all of the models. The table shows the maximum,average, and maximum of the averages of the residuals for eachvoltage point. It also includes the maximum of the average of theresiduals for the 15 middle voltage points, which correspond tothe 10–90% range of the transition, which is more critical foraccurate timing analysis [35]. It was found that larger errors areassociated with longer transitions and the tails of the waveforms.Specifically, it can be seen that residuals associated with thecenter of the waveform are close to zero. The symmetric modelsappear to be the more accurate.

The Q-statistic [19] and T2-statistic [19] were used to analyzethe adequacy of the models by determining the number of outliersin the original dataset. Outliers correspond to waveforms in theoriginal dataset that are not accurately modeled by PCA. Table 3shows the fraction of outliers considering each of the screeningstatistics. The table indicates that the number of outliers for allthree models is very similar.

transition,fanout = 3)

0

0.5

1

1.5

2

2.5

3

Tota

l Del

ay (n

s)

HspiceTabluarPC

1Stage Node

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Fig. 8. Comparison of delay for the three methods for (a) a fast rising input

transition and (b) a slow rising input transition. (f ¼ fanout).

5. The cell model and timing analysis

We have applied the SNM waveform model with twosignificant principal component basis functions to our dataset tofind a relationship between the input parameters (listed inTable 1) and output parameters (L, Y, and Stage_Delay) for theinverter cell. L and Y characterize the shape of the outputwaveform. Stage_Delay is the delay from input to outputmeasured at 50% of the supply voltage. The relationship betweenthe input parameters and output parameters is computed usingYate’s algorithm [34] to determine all 511 effects (linearcoefficients and interactions) and the average. Because of the lackof experimental error, significant effects were found using normalprobability plots [34]. The resulting model indicates how theshape of the output waveform and delay vary as a function of theshape of the input waveform, process parameters, and variationsin the operating environment (temperature and supply voltage).

The output waveform is characterized by L, Y, and Stage_Delay. L was found to be a function of the input waveform L and Y,fanout, supply voltage, temperature, n-channel threshold voltage,and p-channel length. Y was found to be a function of the inputwaveform L and Y and fanout. Stage_Delay was found to be afunction of the input waveform L and Y, fanout, supply voltage,temperature, n- and p-channel threshold voltages, and p-channellength.

In evaluating the accuracy of the model, we do not consider thepresence of variation in process parameters, supply voltage, andtemperature, which is a function of the number of terms on themodel equations. The accuracy of the model as a function ofparameters is considered in detail in [36]. Instead, we justconsider accuracy in the presence of variations in the shape ofthe input waveform and fanout. This enables us to make a directcomparison with tabular static timing analysis.

The accuracy of the PCA model is evaluated by estimating thedelay of a narrow tree of inverters, with a depth of 20 and fanout

ranging from two to five, as shown in Fig. 7. This provides a way todetermine the accuracy of the model for timing analysis of pathsin large circuits, with the only simplification being that the samecell is used for all of the stages. The number of fanouts at eachstage and the slope of the input to the first gate were varied. Thetotal delay from the input to the output of each stage wasdetermined using the following three methods.

Method 1: Tabular (Slope, Fanout) propagation. The invertertiming is characterized for combinations of (Slope, Fanout) intables. Delay is estimated through linear and bilinear interpola-tion from the tables. This method requires the following functions,where i is the index for the stage.

Slopeðiþ 1Þ ¼ Slope_FunctionðSlopeðiÞ; FanoutðiÞÞ

Stage_Delayðiþ 1Þ ¼ Delay_FunctionðSlopeðiÞ; FanoutðiÞÞ

Total_Delayðiþ 1Þ ¼ Total_DelayðiÞ þ Stage_Delayðiþ 1Þ (10)

Page 7: Timing Analysiswithcompactvariation Awarestandardcellmodels

ARTICLE IN PRESS

Average Relative Error

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

(0.3

2,2)

(1.7

6,2)

(3.2

,2)

(0.3

2,3)

(1.7

6,3)

(3.2

,3)

(0.3

2,4)

(1.7

6,4)

(3.2

,4)

(0.3

2,5)

(1.7

6,5)

(3.2

,5)

(Slope,fanout)

Ave

rage

Rel

ativ

e Er

ror Method 1 Method 3

Average Variance

0

0.05

0.1

0.15

0.2

0.25

(0.3

2,2)

(1.7

6,2)

(3.2

,2)

(0.3

2,3)

(1.7

6,3)

(3.2

,3)

(0.3

2,4)

(1.7

6,4)

(3.2

,4)

(0.3

2,5)

(1.7

6,5)

(3.2

,5)

(Slope, fanout)

Ave

rage

Var

ianc

e Method 1 Method 3

Fig. 9. Average relative error (a) and error variance (b) of delay for Methods 1 and 3 in comparison with Hspice using data from the outputs of each of the 21 stages.

Table 4Time and space complexity comparison per delay/transition entry per input.

Method 1 Method 3

Model space complexity O(k(p+q)) linear case:

O(qkp)

O(pw+p2(p+q)) linear case:

O(p(w+p+q))

Characterization time

complexity

O(k(p+q)) linear case:

O(qkp)

Maximum:

O(w3+(w2+p(p+q))2(p+q))

Simulation time

complexity

O(sk(p+q)) linear case:

O(s(pk+q))

O(sp2(p+q)) linear case:

O(sp(p+q))

S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312–320318

Our implementation included 272 elements in the table: 16slopes and 17 fanouts.

Method 2: Simulation using Hspice. This method solvesnumerical differential equations to find delay.

Method 3: PCA for delay propagation. Delay is calculated asfollows, where i is the index for the stage.

Lðiþ 1Þ ¼ Length_FunctionðLðiÞ; TðiÞ; FanoutðiÞÞ

Yðiþ 1Þ ¼ Angle_FunctionðLðiÞ;YðiÞ; FanoutðiÞÞ

Stage_Delayðiþ 1Þ ¼ Delay_FunctionðLðiÞ;YðiÞ; FanoutðiÞÞÞ

Total_Delayðiþ 1Þ ¼ Total_DelayðiÞ þ Stage_Delayðiþ 1Þ (11)

Our implementation requires the storage of 16 coefficients. Theinput to the first stage for all methods was a ramp. Therefore, theinput to the first gate must be mapped to the PCA domain forthe PCA method.

The delays obtained using the three methods are compared inFig. 8. It shows that Method 1 diverges from Hspice for longchains. Method 3 is not smooth, and oscillates around the delayfrom Hspice, but does not diverge as fast as Method 1.

Delays from Hspice (Method 2) are used as the basis ofcomparison to obtain errors for each method. The average relativeerrors and average variances are compared in Fig. 9, using datafrom the outputs of each of the stages, i.e. from stage 2 to 21 (20points), which shows that Method 3 is more accurate.

The simulation of the circuit were performed on a 4-CPU UltraSparc II 400 MHz server with a Sun Solaris operating system tocompare the three methods. The simulation time for Methods 1 and3 were 0.2 s, while the simulation time for Method 2 was 21.8 s.

6. Complexity analysis

Table 4 compares the estimated time and space complexity pertransition entry per input for each cell for Methods 1 and 3. Let p

be the number of parameters characterizing a cell. For Method 1,p ¼ 2, i.e. slope and fanout. For Method 3, p ¼ 3, since this methodrequires a pair of PCSs to characterize the waveform shape, plus adelay value. Let us also suppose that we take into account q

sources of variation, deriving from the process and environment(temperature and supply voltage). Method 1 requires a p-dimension table of numbers with k levels in each dimension. Ifwe take into account q sources of variation by computingsensitivities to each of these parameters for each of the tableentries (i.e. we are postulating a linear model), we then requireq+1 tables with kp entries. Otherwise a table with k(p+q) entries isneeded for a model with all interactions. Hence the spacecomplexity of Method 1 is O(k(p+q)), which is reduced to O(qkp)if we assume a linear model.

Method 3 discretizes waveforms into w voltage steps. A modelwith p parameters has p�1 significant eigenvalues. Consequently,(p�1)w numbers must be stored. In addition, the model producesa maximum of 2(p+q) coefficients for each of the p expressions. Theresulting model space complexity for Method 3 is O((pw+2(p+q))).However, typically, only linear terms are significant, in which caseeach of the p expressions have O(p+q) coefficients, resulting in aspace complexity of O(p(w+p+q)).

The characterization time complexity of Method 1 is propor-tional to number of simulations needed to obtain each number in

Page 8: Timing Analysiswithcompactvariation Awarestandardcellmodels

ARTICLE IN PRESS

19 Levels

Voltage

15 Levels

10 Levels

5 Levels--case 1

5 Levels--case 2

5 Levels--case 3

3 Levels

Fig. 10. PCA waveform discretization patterns for the voltage scale.

S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312–320 319

its lookup table, and hence is the same as the model’s spacecomplexity.

Method 3 requires several steps. First, 2(p+q) simulations areperformed, which results in w2(p+q) points to be analyzed by PCA.The generation of the appropriate w-dimensional covariancematrix and its eigen decomposition using SVD have computa-tional costs of O(w22(p+q)) and O(w3), respectively. Iterativemethods exist, which avoid finding the covariance matrix. Theyreduce the computational cost to O(rw) per iteration, where row

and where an iteration involves sequentially inputting each of the2(p+q) w-dimensional vectors. One such method is Sanger’sgeneralized Hebbian algorithm [37]. Next all of the waveformsare converted to the PCA domain at a cost of O(w2(p+q)) to developp expressions by analyzing the resulting PCA domain dataset at acost of O(p2(p+q)). The coefficients of the resulting full factorialmodel are sorted to find the significant factors, at a cost ofO(p(p+q)2(p+q)), and the significant coefficients are selected at acost of O(p2(p+q)). Table 4 shows the dominant terms.

The simulation time complexity of Method 1 is proportional tothe table lookup time and the number of stages (s). A table lookuprequires a search in each of the p dimensions among the k entries,which has complexity O(pk). Once the appropriate entry isselected, the delay is computed, which takes into account the qsensitivities and has complexity O(q) if the model is linear. Thisprocess is repeated for each of the s stages, resulting in asimulation time complexity of O(s(pk+q)). For a nonlinear modelthe time complexity is O(sk(p+q)).

Method 3 requires the evaluation of p expressions, each with atmost 2(p+q) terms, for each of the s stages. This results in acomputational complexity of O(sp2(p+q)). However, typical expres-sions contain at most (p+q) linear terms, corresponding to asimulation time complexity of O(sp(p+q)).

It can be seen from Table 4 that Method 1 is linear incharacterization time complexity, while Method 3 is exponential.However, characterization is done only once for a cell library.Model users are only impacted by space and simulation timecomplexity. In addition, if a fractional factorial experimentaldesign [34], rather than a full factorial experimental design wereperformed to generate the dataset, characterization would bepolynomial in q.

Method 1 is exponential in model space and characterizationtime complexity as a function of p, which limits the discretizationof the space that describes the input waveforms (slope andfanout), while Method 3 is not. As a result, memory usage doesnot increase rapidly with increasingly accurate waveforms.

Moreover, as we add more parameters, q, Method 1 requires kp

more entries for each additional parameter, while Method 3requires only p additional entries. Therefore, memory usage doesnot increase as rapidly for Method 3 as the number of parametersincreases.

PCA Waveform Accuracy

0

0.1

0.2

0.3

0.4

0.5

0.6

3

Sum

of S

quar

es o

f Err

or Uniform Discretization 5 Levels -- Case 15 Levels -- Case 3

5 7 9 11 13 15 17 19

Fig. 11. Increase in accuracy of the PCA waveform as a function of the number of

discretization levels.

7. Conclusions

This paper provides a method to develop compact models ofstandard cells for static timing analysis enabling accuratecharacterization over variations in input waveform characteristics,output loading, process parameters, and the environment (tem-perature and power supply voltage). Compact characterizationutilizes principal component analysis of waveforms. The resultingmodels are stored as coefficients of equations. The compactmodels enable the performance of a variety of statisticalexperiments, including efficient Monte Carlo analysis of theimpact of within-die variation on delay and of the impact ofvarious temperature profiles and variations in the power supplyvoltage on delay.

Three approaches to PCA analysis have been compared, andindicate that the SSM and SNM methods result in smallermodeling errors.

In addition, the accuracy and efficiency of the method has beenevaluated in comparison with Hspice and the slope-fanout tabularmethod. Runtimes are comparable with the tabular method, whileaccuracy and memory usage is improved.

Acknowledgement

The authors would like to thank the Semiconductor ResearchCorporation for support of this research project under Task1419.001.

Appendix. Accuracy analysis of the PCA waveform model

The discretization level for waveform modeling was chosen inorder to have straightforward voltage levels for transistors in thetechnology that was used. However, accuracy of the PCA wave-form model is a function of

(a)

the number of discretization levels along the voltage axis, and (b) the choice of voltage levels to discretize the waveform on the

voltage axis.

To analyze waveform model accuracy with respect to thenumber of discretization levels along the voltage axis, seven

Page 9: Timing Analysiswithcompactvariation Awarestandardcellmodels

ARTICLE IN PRESS

S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312–320320

discretization patterns were compared. These are summarized inFig. 10, with between three and 19 levels.

Fig. 11 compares the accuracy of the uniform discreticationplans using the sum of squares of error (SOS). It can be seen that atleast 10 points are needed to achieve high accuracy, and thatincreasing beyond 10 points does not increase accuracy much.However, it should be noted that as few as five points can achievehigh accuracy if they are appropriately placed. Additionally, fewerdiscretization levels result in fewer principal component basisfunctions. However, even with the worst case that we studiedwhere we considered 19 levels for discretization, two principalcomponent basis functions covered 99.8% of the variation.

References

[1] T. Kirkpatrick, N. Clark, PERT as an aid to logic design, IBM J. Res. Dev. 10 (2)(1966) 135–141.

[2] D. Lee, V. Zolotov, D. Blaauw, Static timing analysis using backward signalpropogation, Proc. Design Automat. Conf. (2004) 664–669.

[3] S.R. Nassif, A.J. Strojwas, S.W. Director, A methodology for worst-case analysisof integrated circuits, IEEE Trans. Comput. Aided Des. 5 (1) (1986) 104–113.

[4] A. Agarwal, V. Zolotov, D.T. Blaauw, Statistical timing analysis using boundsand selective enumeration, IEEE Trans. Comput. Aided Des. 22 (9) (2003)1243–1260.

[5] X. Li, et al., Defining statistical timing sensitivity for logic circuits with large-scale process and environmental variations, IEEE Trans. Comput. Aided Des.27 (6) (2008) 1041–1053.

[6] H. Chang, S.S. Sapatnekar, Statistical timing analysis under spatial correla-tions, IEEE Trans. Comput. Aided Des. 24 (9) (2005) 1467–1482.

[7] B. Cline, et al., Analysis and modeling of CD variation for statistical statictiming, Proc. Int. Conf. Comput. Aided Des. (2006) 60–66.

[8] D. Blaauw, et al., Statistical timing analysis: from basic principles to state ofthe art, IEEE Trans. Comput. Aided Des. 27 (4) (2008) 589–607.

[9] S. Bhardwaj, S. Vrudhula, A. Goel, A unified approach for full chip statisticaltiming and leakage analysis of nanoscale circuits considering intradie processvariations, IEEE Trans. Comput. Aided Des. 27 (10) (2008) 1812–1825.

[10] L. Zhang, et al., Correlation-preserved non-Gaussian statistical timing analysiswith quadratic timing model, Proc. Des. Automat. Conf. (2005) 83–88.

[11] V. Khandelwal, A. Srivastara, A general framework for accurate statisticaltiming analysis considering correlations, Proc. Des. Automat. Conf. (2005)89–94.

[12] J. Singh, S.S. Sapatnekar, A scalable statistical static timing analyzerincorporating correlated non-Gaussian and Gaussian parameter variations,IEEE Trans. Comput. Aided Des. 27 (1) (2008) 160–173.

[13] B. Choi, D.M.H. Walker, Timing analysis of combinational circuits includingcapacitive coupling and statistical process variation, Proc. VLSI Test Symp.(2000) 49–54.

[14] A. Gattiker, et al., Timing yield estimation from static timing analysis, Proc.Int. Symp. Qual. Electron. Des. (2001) 437–442.

[15] M. Orshansky, K. Keutzer, A general probabilistic framework for worst casetiming analysis, Proc. Des. Automat. Conf. (2002) 556–561.

[16] A. Agarwal, et al., Statistical delay computation considering spatial correla-tions, Proc. Asia South Pacific Des. Automat. Conf. (2003) 271–276.

[17] C.S. Amin, et al., Statistical static timing analysis: how simple can we get?,Proc. Design Automat. Conf. (2005) 567–652.

[18] CMOS nonlinear delay model calculation, in: Library Compiler User Guide,vol. 2, Synopsys, 1999.

[19] J.E. Jackson, A User’s Guide to Principal Components, Wiley, New York, 2003.[20] P. Feldmann, S. Abbaspour, Towards a more physical approach to gate

modeling for timing, noise, and power, Proc. Des. Automat. Conf. (2008)453–455.

[21] R. Gandhi, et al., Delay modeling using ramp and realistic signal waveforms,Proc. Int. Conf. Electro-Inform. Technol. (2005).

[22] C.S. Amin, F. Dartu, Y.I. Ismail, Weibull-based analytical waveform model,IEEE Trans. Comput. Aided Des. 24 (8) (2005) 1156–1168.

[23] A. Jain, D. Blaauw, V. Zolotov, Accurate delay computation for noisy waveformshapes, Proc. Int. Conf. Comput. Aided Des. (2005) 946–952.

[24] V. Zolotov, et al., Compact modeling of variational waveforms, Proc. Int. Conf.Comput. Aided Des. (2007) 705–712.

[25] S.R. Nassif, E. Acar, Advanced waveform models for the nano-meter regime,International Workshop on Timing Issues in the Specification and Synthesisof Digital Systems, 2004.

[26] A. Ramalingam, et al., Accurate waveform modeling using singular valuedecomposition with applications to timing analysis, Proc. Des. Automat. Conf.(2007) 148–153.

[27] H. Fatemi, S. Nazarian, M. Pedram, Statistical logic cell delay analysis using acurrent-based model, Proc. Des. Automat. Conf. (2007) 253–256.

[28] S. Basu, P. Thakore, R. Vemuri, Process variation tolerant standard cell librarydevelopment using reduced dimension statistical modeling and optimizationtechniques, Proc. Int. Symp. Qual. Electron. Des. (2007) 814–820.

[29] A. Goel, et al., A methodology for characterization of large macro cells and IPblocks considering process variations, Proc. Int. Symp. Qual. Electron. Des.(2008) 200–206.

[30] S. Sundareswaran, et al., Characterization of standard cells for intra-cellmismatch variations, Proc. Int. Symp. Qual. Electron. Des. 1 (2008) 213–219.

[31] Star-Hspice Manual, Avant! Corporation and Avant! Subsidiary, 2001.[32] AssuraTM Physical Verification User Guide, Cadence Design Systems Inc.,

2005.[33] Using MATLAB, The MathWorks Inc., 1999.[34] G.E.P. Box, W.G. Hunter, J.S. Hunter, Statistics for Experimenters, Wiley, New

York, 1978.[35] M. Ketkar, K. Dasamsetty, S. Sapatnekar, Convex delay models for transistor

sizing, Proc. Des. Automat. Conf. (2000) 655–660.[36] S.-A. Aftabjahani, L. Milor, Compact variation-aware standard cell models for

static timing analysis, Proc. Des. Circuits Integrated Syst. (2008).[37] T.D. Sander, Optimal unsupervised learning in a single-layer linear feedfor-

ward neural network, Neural Networks 2 (6) (1989) 459–473.

Seyed-Abdollah Aftabjahani received his B.S. degreein Computer Engineering from the National Universityof Iran, Tehran, Iran in 1994, and his M.S. in Electricaland Computer Engineering from the University ofTehran, Tehran, Iran in 1997. He is currently conductingresearch for his Ph.D. dissertation in the Semiconduc-tor Testing and Yield Enhancement Group Laboratoryat the School of Electrical and Computer Engineering,Georgia Institute of Technology.

Prior to going back to school in 2001, he worked as asoftware engineer for TRW and the Computer ScienceCorporation in Atlanta. He also worked as a research

engineer for 7 years at the Iran Telecommunications

Research Center on designing and developing embedded systems for telecommu-nications systems, fax machines, automatic mark readers. He was the technicallead of the software development team. He also served as a consultant for severalcompanies and has experience in automation and control using computers andembedded systems for industrial and commercial products. His research interestsinclude: statistical variation-aware timing analysis and modeling, simulationacceleration using compiler techniques, digital testing and testable design, andmodeling for digital testing using Hardware Description Languages.

Linda Milor received her Ph.D. degree in ElectricalEngineering from the University of California, Berkeley,in 1992.

She is currently an Associate Professor of Electricaland Computer Engineering at the Georgia Institute ofTechnology, in Atlanta, Georgia. Prior to that, sheserved as Vice President of Process Technology andProduct Engineering at eSilicon Corporation, as ProductEngineering Manager at Advanced Micro Devices,Sunnyvale, California, and as a Faculty Member at theUniversity of Maryland, College Park. She has authoredover 90 publications and holds four patents on yield

and test of semiconductor-integrated circuits.