directed study report
TRANSCRIPT
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 1/12
1
AUTOMATED STANDARD CELL LIBRARY GENERATION
& STUDY OF CELL LIBRARY FUNCTIONAL CONTENT
Yunbum JungDept. of Electrical Engineering and Computer Science
University of Michigan [email protected]
ABSTRACT
As the operating frequency of circuits increases,
the use of a fixed library faces a limit in generating
high performance circuits. One of the ways to go
beyond the limit of a fixed cell library is the use of
a fluid cell library. The fluid cell library provides a
customized drive strength of each cell that is not in
the fixed cell library but is required for a fine
circuit tuning. For the effective generation of a
fluid cell library as well as a fixed cell library, an
automated flow is applied to generate a standard
cell library. This paper presents the procedure for
automated standard cell library generation and an
overview of cell characterization. It also examines
how each logic function in a cell library affects an
automated circuit design. This experiment shows
that the target cell library should be selectively
chosen for the good quality of a synthesized design.
1. INTRODUCTION
Although a standard cell methodology reduces the
design effort in terms of time and cost, the
performance of a synthesized design is very poor
compared to a custom design. Many studies to
improve the quality of a synthesized design have
been performed [1, 2, 3, 4, 5] and it has been
shown that simple modifications to the cell library
significantly impact the performance of a
synthesized design [1]. Especially in a high
performance application, the use of a fixed cell
library prevents fine device tuning for delay and
power optimization [5]. As an alternative to the
limit of a fixed cell library, the use of a fluid cell
library suitable for circuit tuning was suggested in
[3]. This kind of effort increases the need for
automation in generating a cell library since an
automated flow for generating new cells easily
creates the various cells required for circuit
optimization.
In generating a new cell library, accurate
characterization of each cell is important since
other tools that use these cells predict the behavior
of circuits based on the characterized cell data.
Before mentioning the advantages of automated
cell characterization, it seems proper to consider
the disadvantages of manual cell characterization.
Manual cell characterization requires a cell
designer to create netlists and interactively run
simulations. With this method, stimulus must be
developed and applied to the cell beingcharacterized. Once the simulation is complete, the
data is extracted from each run. Then the cell
designer inserts the data into the gate level models
and the datasheets. But this manual cell
characterization is prone to cause errors and
requires tremendous effort [6].
A cell library generation process includes cell
design, layout generation, physical abstraction as
well as cell characterization. Through this process,
several designers may share these tasks. Therefore,
if misunderstandings exist among designers, the
manual method may lose consistency in theprocedures.
In order to reduce both the effort to generate a new
standard cell library and the number of inadvertent
errors introduced when these tasks are done
manually, an automated flow is applied to generate
a new cell library. The well-organized and
automated flow provides consistency in procedures,
increases the range of simulation capabilities at the
cell characterization step, and minimizes the risk
of errors.
2. STANDARD CELL LIBRARYGENERARTION
As shown in Fig. 1, the process of generating a
standard cell library consists of four major steps.
Most of steps are automatically carried out. Netlist
files in spice format are created at the cell design
step. Layout that is the physical implementation of
netlist is generated in the layout step. Verification
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 2/12
2
and parasitic extraction are also performed during
the layout step. Stimulus generation, SPICE
simulations, and data compilations are part of
characterization. Physical abstraction of each cell
for the place & route tool can be carried out in
parallel with characterization. A physical abstract
includes information about blockage layers, pin
locations, and cell symmetry.
Fig. 1 Standard cell library generation flow
2.1 CELL DESIGN
The cell design phase consists of circuit design and
transistor sizing. After drawing a schematic
representing a logic gate, the cell designer
determines transistor size of each cell. Since the
delay in circuits depends not only on the drive
strength of each stage but also on its P/N width
ratio, it is important to provide a good P/N widthratio of each cell in standard cell library. An
optimal P/N ratio of each cell is derived such that
it minimizes path delay [7]. Based on the optimal
P/N ratio, the netlist template for each cell is
generated. This netlist template is used for creating
the cell netlist with the intended size. It has been
shown that providing cells with only a single drive
strength degrades the speed of a synthesized
design [1]. Therefore, providing each cell in a
variety of drive strengths is considered as a
standard cell library design guideline. According
to this guideline, most of the cells in a fixed
standard cell library are designed in 4 drivestrengths (xL, x1, x2, and x4) and the buffers and
inverters have 5 additional drive strengths (x3, x8,
x12, x16, and x20). For providing more variety of
drive strengths easily, the size of each cell is
parameterized. Fig. 2 illustrates how to use a
parameterized netlist template. The P/N ratio of
the inverter is fixed but the widths of transistors
are scaled proportionally. If the number of
transistors connected in series is large, it degrades
the falling/rising time due to the higher resistance.
Therefore, each cell is restricted such that it does
not have more than 4 series transistors.
Fig. 2 Parameterized Inverter template
The cells of a standard cell library are categorized
into seven groups as follows.
• Negative unate logic cells
• Positive unate logic cells• Arithmetic cells
• Sequential cells
• Special cells
• Inverted input cells
• Low skew cells
Negative unate logic cells consist of INV, NAND,
NOR, AOI, OAI, and XNOR function families.
Positive unate logic cells comprise BUF, AND,
OR, AO, OA, and XOR function families. FADD
(Full Adder) and HADD (Half Adder) function
families compose arithmetic cells. DFF and
LATCH function families are included insequential cells. MUX and Tri-state belong to the
special cells. And there are two more interesting
groups. One group comprises inverted input cells
and the other, low skew cells. Inverted input cells
such as NAND2B and NOR3BB (ëBí means
inverted input) provide inverted inputs such that it
effectively makes an internal connection between
the inverter and the logic function. Low skew cells
are used for clock distribution schemes where low
skew and high speed are primary concerns.
CLKINV and CLKBUF families belong to the low
skew cells.
2.2 LAYOUT
As shown in Fig. 3, the layout phase consists of
physical layout generation, layout verification, and
parasitic extraction.
ProTech, ProSpin, ProGen, and ProSticks from
Prolific are used to generate physical layout from
the netlist in spice format. ProTech configures the
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 3/12
3
Fig. 3 Layout step
fabrication technology specification, design styles,
the cell template, and the layer information.
ProSpin reads in spice netlists describing
individual cells and produces corresponding .cel
and .db files that are read by ProGen and ProSticks
respectively. The .cel file specifies the generators
that should be called to create the final layout data
and contains cell specific information such as
transistor size and node name. And the .db file is
used to generate the symbolic layout that is used
by ProSticks. ProGen reads in the .cel file andproduces the physical layout. After reading the .cel
file, ProGen invokes the proper generators to
produce a loose physical layout. Then ProGen
compacts this initial layout to produce a final cell
that is as small as possible [8]. Fig. 4 shows an
example of a layout that is generated by the
Prolific suite.
Fig.4 AO21X2 layout
Although ProGen tries to satisfy all design rules
and layout constraints, it sometimes generates a
cell with violations such as a cell height violation
or design rule errors. Thus, the verification step is
an essential part of layout generation to insure
getting a correct layout. Calibre from Mentor
graphics is used for accurate verification. Calibre
performs layout vs. schematic (LVS) checking as
well as design rule checking (DRC).
Fig.5 Symbolic AO21X2 layout
Once the design violations are found, ProSticks
can be used to fix those errors. ProSticks supports
a graphical user interface for the symbolic layout.
Fig. 5 shows an example of a symbolic layout.
Simple routing rearrangement and transistor
relocation on the symbolic layout can correct most
of errors of simple cells. But, in case of complex
cells such as MUX and ADDER, a height violation
is hard to fix. Special options such as poly contact
merging and diffusion/metal-1 contact wire are
used to obtain more available routing space.
Aggressive poly contact merging eliminates the
unnecessary metal-1 wire between poly contacts.
Diffusion nodes can be connected to the power or
ground rails with a contact of diffusion and metal-
1 wire such that it makes other metal-1 wires go
over the active region.
The last step of the layout phase is parasitic
extraction. This is the process of creating an
electrical model of the physical interconnections.
The physical interconnect does not behave as an
ideal wire. Instead, it behaves like a network of
capacitances, inductances, and resistors, which can
dominate circuit behavior. It does not do muchgood to do power or timing analysis of a design
without the parasitic network. Xcalibre from
Mentor Graphics is used for parasitic extraction.
The original spice netlists combined with the
parasitic network are used for characterization
simulation.
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 4/12
4
2.3 CHARACTERIZATION
Characterization enables a designer to abstract
timing and power models. This abstraction shifts a
design from the transistor level to the gate level,
enabling synthesis, floor planning, place & route,
and delay & power calculation [9].
Fig. 6 Characterization in terms of input slew rate
and load capacitance
As shown in Fig. 6, a cell is usually characterizedin terms of the input slew rate tin and output
capacitive load CL where supply voltage,
temperature, and process values are given. But this
traditional approach is challenged in the era of
deep sub-micron process. The single capacitance is
no longer enough to represent a load that acts like
a RC network. However, the auto-characterization
tool is implemented based on the traditional
definition. Another concern of cell characterization
is stimulus generation. Current standard cell
libraries can handle only single switching input,
that is, it cannot deal with multiple switching
inputs that are frequently encountered in a real
operation. This is another shortcoming of cell
characterization. Although an exhaustive
enumeration of all input states satisfies the
requirement of stimulus, it is a waste of simulation
time. Therefore, selecting a minimal set of input
vectors is seriously considered to reduce
simulation time.
The characterized cells have several common
attributes such as output transition times,
propagation delays, internal switching power,
leakage power, input pin capacitances, and cell
area. Sequential cells have additional requirements
of characterizing relative signals. Relative signalsare signals that are timing-critical to another
signalís state. Relative signals include setup,
recovery, hold, and removal time [6].
2.3.1 Timing characterization
The timing attributes such as output transition time,
propagation time, setup constraint, hold constraint,
recovery constraint, and removal constraint are
modeled for timing characterization.
Both output transition time and propagation time
are required for gate level synthesis and delay
calculating tools [9]. In those tools, output
transition time (and the interconnect wire parasitic
data if possible) is used for estimating input slew
rates of successive cells and the propagation time
of each cell is extracted from the cell library table
based on input slew rate and load. Consequently,
the delay between two nodes in a design can be
calculated with the propagation time and output
transition time of each cell on the path between
two nodes.
Since a signal arrives at an input pin with a ramp
time and a sequential cell takes some time to latch
the data signal correctly, the data signal that
arrives to a sequential cell has to be stabilized
before and after clocking by defining setupconstraint and hold constraint respectively. Clock
transition to active is the concern of measuring
setup and hold constraint of edge-triggered
sequential cells. If an unstable data signal arrives
at an input near the clock transition to active, the
edge triggered sequential cell may evaluate a
wrong output or it goes through a metastable state.
The unintended data is held until the next clock
transition to active. On the other hand, clock
transition to inactive is the concern of measuring
setup and hold constraint of level-sensitive
sequential cells. If an unstable data signal arrives
at an input near the clock transition to inactive, theunintended data may be held while the clock is in
the inactive state as in the case of the edge-
triggered sequential cell. In a design with
sequential cells, both attributes constrain the delay
of combinational circuits that are placed between
two sequential cells where the operation frequency
of the design is given. Fig. 7 shows consecutive
pipeline stage formed by two edge-triggered
clocking sequential cells, comprising a long path
and a short path [10].
Fig. 7 Consecutive pipeline stage formed by two
edge-triggered sequential cells
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 5/12
5
On the long path, the maximum time available for
evaluation of combinational logic in one clock
period, t CLmax, is given by
t CLmax = t CYeff ñ(t SU2+t CQ1)
where t CYeff
is the effective cycle time, t SU2
is the
setup constraint of the second sequential cell, and
t CQ1 is the Clock to Q propagation time of the first
sequential cell.
On the short path, if the next state value from the
second sequential cell reaches the first sequential
cell during the hold time of the first sequential cell,
the next state value will corrupt the current state
value of the first sequential cell. The minimum
propagation time, t CLmin, through combinational
logic on the short path is expressed by
t CLmin = t SK + (t H1 ñt CQ2)
where t SK is the clock skew between clocking of
both sequential cells, t H1 is the hold constraint of
the first sequential cell, and t CQ2 is the Clock to Q
propagation time of the second sequential cell.
Recovery and removal constraints describe the
timing requirements on the control signals, such as
preset or clear, with respect to the clock signal. A
sequential cell needs some time to be out of the
influence of the control signal after the control
signal becomes inactive. This time is referred to as
recovery constraint. Therefore, the control signal
should become inactive at least a time (recoveryconstraint) before clocking in order to insure the
clocking effective. On the other hand, removal
constraint is the minimum time for control signal
to influence the latched value. If a control signal
becomes inactive before the removal constraint,
the control signal will not affect the operation of
the sequential cell.
In the simulations for the timing attributes
mentioned above, the minimal stimulus comprises
input vectors that cause an output transition,
although the necessary input vectors are different
according to each attribute. Input vectors can besubdivided into data signals, control signals, and
clock. More details of stimulus for each attribute
are described in the following sections.
Output transition / Propagation time
As shown in Fig. 8, the output transition time is
measured between two pre-determined edge
threshold values of output signal (e.g., 10% of the
voltage range to 90% of the voltage range). The
propagation time is measured between pre-
determined delay threshold value of input signal
and that of output signal (e.g., 50% of the voltage
range of input signal to 50% of the voltage range
of output signal).
A data transition that can cause an output
transition is the required condition of the stimulus
for both attributes. In sequential cells, the Clock to
Q propagation time can be obtained from the clock
transition. In the level-sensitive clocking
sequential cells such as LATCH, the D to Q
propagation time can also be obtained with the
clock in the active state.
Fig. 8 Output transition and propagation time
Setup / Hold constraint
The setup constraint is generally defined as the
minimum time allowed between the arrival of the
data and the transition of the clock signal. If the
data signal makes a transition during setup time, an
incorrect value may be latched [11, 12]. But thisdefinition needs to be modified to improve the
delay performance of a synthesized design that
uses sequential cells because the minimum setup
constraint tends to cause a long Clock to Q
propagation time. Therefore, one more condition
can be added to the setup definition such that the
setup constraint does not degrade the Clock to Q
propagation time more than a pre-determined
tolerance (e.g., 5% of Clock to Q propagation time
that can be obtained when the time between the
data arrival and the clock transition is enough).
The hold constraint describes the minimum time
allowed between the transition of the clock signal
and the latching of the data. If the data signal
makes a transition during hold time, an incorrect
value may be latched [11, 12].
Data that can change the output state is the
required condition of the stimulus. In the case of
the cell with control input, the control signal
should be set in inactive state.
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 6/12
6
Fig. 9 illustrates the setup and hold constraints of
an edge-triggered sequential cell where clock is
high active. Fig. 10 illustrates the setup and hold
constraints of a level-sensitive sequential cell
where clock is high active.
Fig. 9 Setup and Hold constraint of an edge
triggered sequential cell.
Fig. 10 Setup and Hold constraint of a level
sensitive sequential cell.
Recovery / Removal constraint
The recovery constraint describes the minimum
allowable time between the control pin transition
to the inactive state and the active edge of the
synchronous clock signal [12]. Like the setup time,
the tolerance condition is added into the condition
of recovery constraint.
The removal constraint describes the minimum
allowable time between the active edge of the
clock pin while the asynchronous control pin is
active and the inactive edge of the asynchronous
control pin [12].
Data that can change the preset/clear value and
control transition to inactive are the required
conditions of the stimulus. The only difference of
measuring recovery and removal constraint of a
level-sensitive cell is that clock transition to
inactive is the required condition of stimulus while
clock transition to active is the required condition
of stimulus for an edge-triggered sequential cell.
Fig. 11 illustrates recovery and removal constraint
of an edge-triggered sequential cell where clock is
high active and control is low active.
Fig. 11 Recovery and removal constraint of an
edge triggered sequential cell
Bisection method
In order to measure relative signal characterization,
the bisection method is used. Bisection is a method
of optimization that employs a binary search to
find the value of an input variable associated with
a ì goalî value of an output variable. This method
uses a binary search to locate the output variable
goal value within a search range of the input
variable by iteratively halving that range to
converge rapidly on the target value. The
measured value of the output variable is compared
with the goal value every iteration [13].
Fig. 12 Setup constraint search using bisection
Fig. 12 shows how to determine setup constraint
with bisection method where the goals are the
output transition and the allowable Clock to Q
propagation time. To start the binary search, a
lower boundary and an upper boundary are
specified. Data transition 1 at the lower boundary
is early enough to cause a good output signal. Data
transition 2 at the upper boundary is too late to
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 7/12
7
change output signal. This means that the
candidate for setup constraint exists between the
upper and the lower boundaries. Consequently, the
bisection algorithm tests data transition at the mid-
point between both boundaries. Data transition 3 at
the mid-point changes output signal but causes a
long propagation time, that is, data transition 3
does not meet the goal. The bisection algorithm
sets the mid-point as the new upper boundary.
Given the new range, the bisection algorithm tests
data transition at the new mid-point. If the output
value satisfies goals, the new mid-point is set as
the new lower boundary. Otherwise, the mid-point
is set as the new upper boundary. Then the
bisection algorithm tests data transition at the new
mid-point within the new range again. The
bisection algorithm iterates setting new boundary
and mid-point until the binary search reaches a
process-termination criterion. Data transition 4 is
the latest data transition that satisfies the goals.
Therefore, setup constraint, t SU , is given by
t SU = t 2 ñ t 1
2.3.2 Power characterization
Power dissipation can be handled in terms of static
power and dynamic power as shown in Fig. 13.
Static power is the power that is dissipated when
the cell is stable, that is, there is no signal
transition on any inputs or outputs of the cell.
Static power is dissipated in a number of ways.
The largest consumption of static power results
from source to drain subthresold leakage. This
leakage is caused by reduced threshold voltage that
prevents the gate from turning off completely.
Static power dissipation also occurs when current
leaks between the diffusion layers and substrate.
For this reason, static power is often called leakage
power [12].
Dynamic power is the power dissipated when a
circuit is active. Dynamic power is divided into
switching power and internal power. Switching
power results from charging/discharging of load
capacitance. Switching power is calculated by a
gate level power analysis tool where the
interconnect parasitic is known [9]. Therefore,
switching power is excluded from power
characterization of cells. While input or output
signals switch, power is also dissipated by internal
capacitive charging/discharging and short circuit
dissipation. Since this kind of power is dissipated
in the cell during signal switching, it is called
internal power.
Power annotated library
PowerArc from Synopsys is used to generate a
power-annotated library from a library containing
no power data. This tool automatically calculates
stimulus for power characterization and runs
power simulation. PowerArc does not distinguish
rising and falling output transitions in calculating
internal power [14]. In both output transition cases,
internal power dissipations are given by
22
1 2
1)( dd L
t
t dd V C t V t I P ⋅−∂⋅= ∫
where I(t) is current at power node, V dd is source
voltage, and C L is load capacitance.
Consequently, it is no wonder that negative power
values are found in the power-annotated librarythat is generated by PowerArc. Internal power is
overestimated at rising output transition and
underestimated at falling output transition.
Nevertheless, these internal power values are
acceptable if internal power dissipation is
considered for a given period.
I
Cinter-node Cinter-nodeCL CL
I
ISCISC Iinter-node
Iinter-node
ICL
ICL
(a) Static power (b) Dynamic power (rising) (b) Dynamic power (falling)
I
Ileakage
Fig.13 Power dissipation
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 8/12
8
2.3.3 Input capacitance and cell area
Input capacitance is used for the part of the output
load of the previous cell when switching power is
calculated by a gate level power analysis tool or
path delay is calculated by a timing analysis tool.
Input capacitance value is extracted from theparasitic file of each cell that is made during the
layout phase.
The cell area attribute is used to estimate total cell
area. The cell area value is calculated from the
layout during the layout phase and then stored in
temporary files. In the characterization phase, that
value is inserted into cell area table of cell library.
2.4 Physical abstraction and rest works
The individual cells of the cell library are
described in layout format as GDSII. In order thatplace & route tools refer to cells in a cell library,
physical abstracts for cell layouts are needed. The
physical abstracts contain information about
blockage layers, pin locations, and cell symmetry.
Envisia Abstract Generator from Cadence is used
to create physical abstract views for all cells. The
physical abstracts are exported in library exchange
format (*.lef) and used in Cadence place & route
tools.
Once the characterization step is completed, the
ë*.libí file in ASCII text format is created. This file
is compiled into a ë*.dbí file in Synopsys database
format using library compiler from Synopsys. And,
the ë*.libí file is also compiled into a ë*.tlfí file in
timing library format using syn2tlf from Cadence.
The ë*.tlfí file is used for timing driven placement
and routing in Cadence place & route tool.
3. RELATED STANDARD CELL
LIBRARY STUDY
In the standard cell design style, target cell
libraries as well as design flow affect the quality of
a final circuit. Typically, synthesis techniques
optimize a circuit in two phases, logicminimization and library-mapping phase [15, 16,
17]. During the library-mapping phase, synthesis
tool choose the structures and the size of the gate
from the target cell libraries. It is apparent that the
limited sizes of each gate in the target cell libraries
prevent good solutions. As an effort to provide
finer granularity of the gate size, transistor level
resynthesis [2] and fluid cell library [3] were
suggested and those methods achieved
performance improvement.
Another possible approach to improve the quality
of the final circuit is wisely choosing the logic
functions in the target cell library. In order to
examine how each logic function group mentioned
in section 2.1 affects the quality of the circuit and
how synthesis tool picks up cells to satisfy
constraints, a set of benchmark circuits are
synthesized, placed & routed, and resynthesized
using the libraries shown in Table 1. These
libraries are formed by selectively choosing logic
function groups from Artisan standard cell library
in TSMC 0.18-micron technology.
Lib
1
Lib
2
Lib
3
Lib
4
Lib
5
Lib
6
Lib
7
Lib
8
Tri-state x x x x x x x xSequential x x x x x x x xNegative
unate x x x x x x x xPositive
unate x x x x x xInverted
input x x x x
Arithmetic x x xMux x x x
Low skew x x xTable. 1 Overview of cell libraries
3.1 Design Flow
Fig. 15 Standard cell design flow
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 9/12
9
As shown in Fig.15, initial synthesis and place &
route are carried out to get the initial layout. Then
the decent wire load model from the initial layout
is extracted and the cell library is upgraded with
the extracted wire load model. With the upgraded
cell library, synthesis and place & route are
performed and the layout is generated again. From
the layout, parasitics can be extracted. Those
parasitics are used for input of re-optimization in
aspect of cell size. During the re-optimization,
cells with new size replace the cells chosen at the
synthesis step. For the input of timing and power
analysis, resynthesized circuit is placed & routed
again and parasitics are extracted from the final
layout.
Design Compiler from Synopsys is used for
synthesis. Silicon Ensemble from Cadence is used
for place & route, and clock tree generation.
HyperExtract from Cadence is used for wire load
model extraction and parasitic extraction. LibraryCompiler from Synopsys is used for upgrading cell
library with the extracted wire model. PrimeTime
from Synopsys is used for timing analysis.
NanoSim from Synopsys is used for power
analysis.
3.2 Benchmark circuits
Four benchmark circuits were used for the
experiments. In order to observe how each library
affects datapath-dominated circuits and controller-
dominated circuits, two of them (VP2 and
CMUDSP) are chosen from Digital SignalProcessor (DSP) and the others (GPIO and CAN)
are chosen from Controller. Arithmetic functions
are heavily used in DSP while it is less frequently
used in Controller. Using the standard cell design
flow described in section 3.1, each benchmark
circuit is designed with various target clocks and
different libraries.
3.3. Results
Power (or area) vs. delay plots are an effective
way to compare cell libraries since the efficiency
of achieving a particular delay is important [4]. Inorder to make power (or area) vs. delay plots, a set
of benchmark circuits are designed using different
libraries within some target clock range. Through
timing analysis and power analysis, power (or
area) vs. delay plots are obtained. From this study,
several interesting phenomena are found. The
largest library, lib 8, does not always produce the
best result. Instead, the choice of the best library
depends on the operation speed of a circuit and
what kind of circuit is designed.
3.3.1 DSP circuits
Given the target clock range (4~9ns) that is used in
the examination, the target clock vs. delay curvesof VP2 benchmark shows the low plateau region in
Fig. 16. The area of circuits still increases while
there is no performance increase in this low
plateau region. Consequently, too small target
clock degrades the quality of design. From the low
plateau region, the lowest delay can be obtained.
Therefore, the power (or area) vs. delay plots of
VP2 benchmark will show the characteristics at the
low or middle delay regions. On the other hand,
the target clock vs. delay curves of the CMUDSP
benchmark gradually decreases without a low
plateau region in Fig. 17. It means that there exists
the room to reduce delay of CMUDSP benchmark.
Therefore, the power (or area) vs. delay plots of
CMUDSP will show the characteristics at the
middle or high delay regions
6
7
8
9
10
3 4 5 6 7 8 9 10
Target clock [ns]
D e l a y [ n s ]
Library 1
Library 2
Library 3
Library 4
Library 5
Library 6
Library 7
Library 8
Fig. 16 Target clock vs. delay for VP2
Fig. 17 Target clock vs. delay for CMUDSP
As shown in Fig. 18 and 19, the library, lib 4,
including complex cells such as arithmetic and
mux cells shows good area efficiency at the high
delay region while adding inverted input cells to
them (lib 5 and lib 8) somewhat degrades area
4
5
6
7
8
9
10
3 4 5 6 7 8 9 10
Target clock [ns]
D e l a y [ n s ]
Library 1
Library 2
Library 3
Library 4
Library 5
Library 6
Library 7
Library 8
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 10/12
10
efficiency at the high delay region. But, as the
delay decreases, lib 4 is prone to increase the area
of design quickly. Consequently, lib 4 results in
worse area efficiency than lib 5 or lib 8 at the
middle or low delay regions. This means that for
good area efficiency at high speed, inverted input
cells are needed.
140000
160000
180000
200000
220000
240000
260000
6 7 8 9 10
Delay [ns]
C e l l a r e a
Library 1
Library 2
Library 3
Library 4
Library 5
Library 6
Library 7
Library 8
Fig. 18 Delay vs. area curves for VP2
400000
410000
420000
430000
440000
450000
460000
470000
480000
490000
500000
4 5 6 7 8 9 10
Delay [ns]
C e l l a r e a
Library 1
Library 2
Library 3
Library 4
Library 5
Library 6
Library 7
Library 8
Fig. 19 Delay vs. area curves for CMUDSP
As delay becomes close to the lowest delay, all
libraries show the similar area efficiency in Fig. 18.
This can be explained by the decomposition of
complex cells. As expected, the synthesis tool
basically increases the ratio of cells with higher
drive strength in the circuit as target clock
frequency increases. When this simple increase
reaches the limit, complex cells such as positive
unate and arithmetic cells are additionally
decomposed into relatively simple cells. These
decompositions come with the cost of the sudden
increase of total cell count as well as the increase
of total cell area. Therefore, the area benefits of
complex cells are reduced. These decompositions
of complex cells are observed in the VP2
benchmark. As the target clock changes from 9ns
to 8ns, the total cell count increases from 4208 to
7148. Fig 20 shows the complex cell count and Fig
21 shows the relatively simple cell count at both
target clocks (8ns and 9ns).
0
100
200
300
400
500
600
700
AND ADD XOR MX
8ns
9ns
Fig. 20 Complex cell count of VP2
where lib 4 is used
0
200
400600
800
1000
1200
1400
I N V
N O R
N A N D A O
I O A I
M X I
X N O R
O R
8ns
9ns
Fig. 21 Simple cell count of VP2
where lib 4 is used
At the low delay region, lib 1 produces the best
power efficient circuits as shown in Fig. 22 while
all libraries tend to build circuits similar in area
efficiency. On the other hand, at the high delayregion, the power efficiency is similar over the all
libraries as shown in Fig. 23. From this result, it is
apparent that the power density of complex cells
such as arithmetic and mux cells is higher than the
power density of negative cells. Therefore, the
power considering design is proper to high-speed
design while the area efficiency is considered for
design at low speed circuits
25000
30000
35000
40000
45000
50000
55000
60000
65000
70000
6 7 8 9 10
Delay [ns]
A v g . p o w e r [ u W
]
Library 1
Library 2
Library 3
Library 4
Library 5
Library 6
Library 7
Library 8
Fig. 22 Delay vs. power curves for VP2
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 11/12
11
45000
55000
65000
75000
85000
95000
105000
115000
4 5 6 7 8 9 10
Delay [ns]
A V G p o w e r [ u W ]
Library 1
Library 2
Library 3
Library 4
Library 5
Library 6
Library 7
Library 8
Fig.23 Delay vs. power curves for CMUDSP
One more interesting thing is that the ratio of cells
with the lowest drive strength (xL) increases as
operation frequency of the circuit gets fast as
shown in Fig. 24. This result explains that the
lowest drive strength cells are needed to form
longer buffer trees even though it has higher
capacitance due to larger active area than cellswith a time drive strength (x1).
2
4
6
8
10
12
14
4 6 8 10
Delay [ns ]
[ % ]
Lib 1
Lib 2
Lib 3
Lib 4
Lib 5
Lib 6
Lib 7
Lib 8
Fig. 24 % of cells with xL size for CMUDSP
3.3.2 Controller circuits
1
1.5
2
2.5
3
3.5
4
4.5
5
0 1 2 3 4 5 6 7
Target clock [ns]
D e l a y [ n s ]
Library 1
Library 2
Library 3
Library 4
Library 5
Library 6
Library 7
Library 8
Fig. 25 Target clock vs. delay curves of CAN
Given the target clock range (0.5~6ns) that is used
in the experiments, the target clock vs. delay
curves of the CAN benchmark show the high
plateau region as well as the low plateau region as
shown in Fig. 25. Differently from DSP circuits,
the available delay range is narrow. The target
clock vs. delay curves of the GPIO benchmark also
shows the narrow delay range. Therefore, it is hard
to distinguish delay regions in controller circuits.
The increase of the ratio of cells with higher drive
strength and the lowest drive strength also occurs
as the clock becomes fast. The decomposition of
positive cells is observed. Even though the
decomposition of positive cells happens, that does
not increase the total cell count and the area as
much as the decomposition of arithmetic cells.
Within narrow delay range, irregular power (or
area) vs. delay curves are scattered. Therefore, it is
hard to find which library produces the best quality
of circuits.
4. CONCLUSIONS AND FUTURE
WORK
The automatic cell library generation flow is
developed to generate a new cell library easily
without inadvertent errors introduced when a cell
library is generated manually. The parameterized
cell methodology in the automatic flow makes it
possible to generate a fluid cell library. Since the
fluid cell library consists of custom cells with
optimal sizes that can be generated when actual
wire load parasitics are known after placement, it
is expected to attain power and performance close
to custom designs.
This work can be extended to generate the cell
library for an advanced process. Although high
performance integrated circuits often use an
advanced process technology, designs based on
standard cells use processes that lag by several
generations due to the absence of the cell library
for advanced process. A cell library in a Silicon on
Insulator (SOI) process will be generated to
support designs using the SOI process. Then it will
be examined how the cell library in SOI improves
the power and performance of circuits.
The study of cell library functional content shows
that the largest library does not always produce the
best result. For the overall good quality of adatapath-dominated circuit, the best cell libray is
chosen in the way that the area efficiency is
considered at the low speed circuit while the
power efficiency is considered at the high-speed
circuit. In the case of controller-dominated circuit,
it is hard to find which library produces the good
quality of circuits.
7/28/2019 Directed Study Report
http://slidepdf.com/reader/full/directed-study-report 12/12
12
REFERENCE
[1] Ken Scott and Kurt Keutzer, ì Improving Cell Library for
Synthesisî, Proc. of Custom Integrated Circuit Conference(CICC), pp. 128-131, 1994
[2] S. Gavrilov, A. Glebov, S. Pullela, S. C Moore, A.
Dharchoudhury, R. Panda, G. Vijayan, and D. T. Blaauw, ì Library-Less Synthesis for Static CMOS Combinational
Logic Circuitsî , Proc. IEEE Int. Conf. on Computer-AidedDesign (ICCAD), pp. 658-662, 1997
[3] Gregory A. Northrop and Pong-Fei Lu, ì A Semi-Custom
Design Flow in High-Performance Microprocessor Designî ,
Proc. of Design Automation Conference (DAC), pp. 426-431,2001
[4] Miodrag Vujkovic and Carl Sechen, ì Optimized Power-Delay
Curve Generation for Standard Cell ICsî, Proc. IEEE Int. Conf.on Computer-Aided Design (ICCAD), pp. 387-394, 2002
[5] K. Keutyzer and E. Girczyc, ìPanel: Cell libraries ñ build vs
buy; static vs. dynamicî, Proc. of Design AutomationConference (DAC), pp. 341-342, 1999
[6] Teri Hike, McFaul and Karl Perrey, ìCharacterizing a Cell
Library using ICCSî, Proc of ASIC Seminar and Exhibit, pp.p12/5.1-p12/5.4,1990
[7] David S. Kung and Ruchir Puri, ì Optimal P/N Width Ratio
Selection for Standard Cell Librariesî , Proc. IEEE Int. Conf.on Computer-Aided Design (ICCAD), pp. 178-184, 1999
[8] Prolific User Guide
[9] Binay Ackalloor and Dinesh Caitonde, ìAn overview of
Library Characterization in Semi-Custom Designî, Proc. of Custom Integrated Circuit Conference (CICC), pp. 305-312,
1998
[10] Anantha Chandrakasan, William J. Bowhill,, and Frank Fox,Ed. ì Design of High-performance Microprocessor Circuitsî,
IEEE Press, New York, pp 215-218, 2001
[11] Neil H. E. Weste and Karman Eshraghian, Ed ì Principles of CMOS VLSI designî , Addison-Wesley, pp 317-325, 1994
[12] Design Compiler User Manual
[13] Star-Hspice Manual[14] PowerArc User Guide
[15] Eric Lehman, Yosinori Watanabe, Joel Grodstein, and HeatherHarkness, ì Logic Deocomposition during Technology
Mappingî, Proc. IEEE Int. Conf. on Computer-Aided Design(ICCAD), pp. 242-245, 1995
[16] Chi-Ying Tsui, Massoud Pedram, and Alvin M. Despain, ì Technology Decomposition and Mapping Targeting LowPower Dissipationî , Proc. of Design Automation Conference
(DAC), pp. 68-73, 1993[17] Randal E. Bryant, ì Graph-Based Algorithms for Boolean
Function Manipulationî , IEEE Transactions on Computers, pp.
677-699, 1985