multi supply digital layout
TRANSCRIPT
SAME 2001, November 15th
2001 1
Abstract
In this paper, the principle of a technique called
"multi-supply digital layout" is described. The use
of this technique allows a reliable backannotation
between digital blocks that are NOT powered off
the same supplies, within an analog top-cell. The
supplies do not have to have the same voltage
levels, thanks to the integration of level shifters for
voltage adaptation within the digital layout. It is
also applicable in systems where a supply can be
turned off while another one stays alive. This
technique also optimizes the die size with no extra
efforts, reduces the layout phase and optimizes scan
insertion and ATPG.
Index Terms – Layout, level shifter, back-
annotation, scan, low power, multiple supplies,
standard cells.
I. Introduction
The goal of this paper is to present why and how to
make a multiple supply digital layout.
We will present a flow which covers all the steps
from the RTL design down to the layout, using only
standard CAD tools. We will also compare this
technique with the existing literature on the subject,
and explain why it is best suited towards our needs
in terms of resulting area, layout development time,
and scan test.
1. What is post-layout back-annotation ?
It is the process of calculating the cell delays based
on the final routing, and putting these delays into
the cell models for simulation or static timing
analysis.
Back-annotation is needed in order to ensure that
the functionality is kept from RTL design down to
silicon. Back-annotated simulations and static
timing analysis allow the designer to ensure that all
the timing constraints of the design are met.
2. Example of timing constraints: the
setup time
Usually, there are two levels of complexity for
calculating the timing constraints of a flip-flop :
a) Before layout, when the clock is considered
perfect (no skew),
b) After layout, when a clock skew shows up.
Dealing with a multiple supply layout adds another
level of complexity because we need level shifters
on some data and clock paths. The diagram below
summarizes the situation.
Level
shifters
Where δclk(i) is the delay for the clock root driver to the pin of flip-flop i,
δck2q is the transition time of the flip-flop, δd is the data path delay and Tclk
is the clock period.
In order for the layout tool to generate a balanced
clock tree, one needs to have a logical and a timing
model for the level shifters. The level shifters are
presented in section III.4.
3. Why do we need several power
supplies in a design?
There are two reasons for using several power
supplies, both of which are necessary for power
management chips. This kind of circuit is very
common in mobile phones. They are used for
SAME 2001
Session 2: DEA METHODOLOGY
MULTI-SUPPLY DIGITAL LAYOUT
Regis Santonja, Motorola
Volker Wahl, Motorola
Toulouse
SAME 2001, November 15th
2001 2
regulating and distributing the power supplies to the
other chips in the telephone.
a) Reducing Power Consumption
Reducing the power consumption of portable
devices such as mobile phones, PDAs or portable
PCs has become one of the most important goals of
the semiconductor industry. As exposed in section
II.3. of this paper, using several power supplies is
one of the most effective techniques to reduce
power consumption.
b) Interfacing circuits that operate at
different voltage levels.
The second reason for using several power supplies
is to interface circuits that operate at several
voltages. Power management chips include a variety
of programmable functions (such as an audio
codec, an ADC used to monitor the supply levels, a
touch screen interface, a USB, an RS232 port,
etc…). The most effective technique is to have each
of these functions controlled by a logic powered off
the same voltage which is required for the
function’s interface.
A simplified example of how a power management
chip can be in the heart of a multiple supply system
is presented below: we have a processor with inputs
and outputs operating at 1.8V and a core at 2.5V.
The power management chip communicates through
its serial interface (SPI) operating at 1.8V with an
embedded real time clock powered off an external
Lithium cell at 3.2V.
The organization of this paper is the following. In
section II, we present the prior art in multiple
supply layout and show why it is not adapted to our
needs. In section III, we present our layout solution.
In section IV we present the design flow and how to
integrate the analog level shifters in the digital flow.
In section V, we present a program which generates
scripts for Silicon Ensemble. In section VI, we
present our multiple voltage clock tree solution.
Finally, section VII presents a possibility for
enhancing the flow in the future.
II. Prior Art
1. Interfacing circuits that operate at
different voltage levels.
On analog-oriented chips where several digital
blocks powered off different supplies have to be
laid out on the same silicon, the traditional way to
do this was to design and layout the digital blocks
separately, place them as macro cells in the analog
top cell of the chip, then use an analog router such
as IC Craftsman to interconnect the blocks.
This method had the following disadvantages:
a) There was no way to use the inter-block
connections' parasitics and generate a standard
SDF file for back-annotation.
b) Three digital layouts had to be done separately
with no way to globally re-order the scan chain.
c) Three tools and environments had to be used:
Silicon Ensemble, Cadence Framework II
(Virtuoso) and IC Craftsman.
d) Tools such as IC Craftsman and Virtuoso from
Cadence are analog tools and not familiar to
most of the digital designers.
2. Sophisticated layout techniques found
in the literature.
The authors in [1] [2] [3] [4] and [5] have already
proposed some techniques to layout multiple supply
circuits. However, they have started from a different
situation: they have a single supply circuit and want
to save power by multiplying the number of its
supplies. For doing this, they split the circuit at the
gate level and assign to each gate the power supply
which best matches its timing requirements, with no
SAME 2001, November 15th
2001 3
respect to the function implemented, in such a way
that a given function can be spread over several
supplies. As the number of connections within a
function is statistically much bigger than the number
of connections between the functions, this method
(called gate-level voltage scaling) generates a lot of
routing between the supplies. Because of this, these
authors have developed sophisticated techniques in
order to minimise the routing. However, the
drawback is that the placement algorithm has to be
modified. For example, Chingwei Yeh and Yin-
Shuin Kang in [1] and [4] have proposed a
modification of the simulated annealing by
introducing a new cost function associated with
voltage clustering.
These methods cannot be used for our designs, as
we require to use standard CAD tools.
3. How can we reduce power by using
multiple supplies?
This technique - called gate level voltage scaling -
consists in using a low supply voltage for the parts
of the circuit that do not suffer from the implied
transistor performance degradation, and keep a
higher voltage level for the critical paths of the
circuit. Effectively, lowering the voltage is the most
effective technique for reducing CMOS power
consumption because the latter is proportional to
the square of the supply voltage.
4. What about clock distribution?
Many papers have been published since 1990 about
generating a zero skew clock tree [7]. Various
algorithms have been proposed for single supply, as
well as for dual supply circuits [2] [8]. However, in
[2], Usami et al. propose a clock tree structure
where the leaves have to be in the low voltage
region: the tree does not reach the flip-flops in the
other region.
We’ll see in section VII. that we propose a
technique allowing a given clock tree to drive flip-
flops in both low and high voltage regions.
III. Our layout solution
1. Supplies isolation within the epi
Because we are in a mixed-signal environment, we
have to pay attention to the transitions in the digital
domain that might generate commutation noise on
sensitive analog blocks. For this reason, the digital
has to be surrounded by an isolation ring. In the
same manner, we isolate the digital blocks operating
at different voltages from each other, especially if
they do not have the same ground. The picture
below represents two inverters. We can see that
without the isolation, vss1 and vss2 would short
together.
Note that there is a minimum ring width and
distance required between the rings.
2. Layout style
In opposition to the prior art, our starting point is to
develop a chip which is already, by nature, a
multiple supply circuit. In fact, we could say that
another type of voltage scaling technique
(architecture voltage scaling) was used at the
system level, resulting in the definition of a chip in
which all the functions (control, real time clock, SPI
interface etc…) have been assigned to a voltage
supply. For this reason, we do not encounter the
same issues than these authors concerning the
routing. Thus, our layout solution has the following
advantages:
• it is the simplest,
• it works fine with standard cell-based layout
tools (no need to modify the placement
algorithm),
• it includes all the necessary level shifters,
• it makes it easy to isolate the voltage regions
from each other with a negligible impact on the
overall area,
• cells can be abutted in each voltage region as in
usual single-supply layouts.
These last two points can result in significant area
savings compared to the prior art. And if we
compare to section II.1, the listed disadvantages
have disappeared:
a) We can now generate a single standard SDF
file for back-annotation. All the inter-region
connections are taken into account.
b) Only one digital layout had to be done with the
possibility to globally re-order the scan chain.
c) Only one layout tool is used: Silicon Ensemble,
and no analog tool.
d) Silicon Ensemble is familiar to most of the
digital designers.
In practice, we grouped the cells powered by the
same voltage in 3 voltage regions, as presented
below. Note that the three regions are separated by
the necessary isolation ring.
SAME 2001, November 15th
2001 4
Two issues have to be taken into account when a
signal goes from one voltage to another one:
3. The signal goes from a high voltage to
a low voltage
The first issue that can show up is associated with
antenna diodes that can allow a static current to
flow from the high to the low voltage region.
Effectively, charge-collecting antennas are formed
during wafer processing when an interconnect (field
poly or metal) is connected to a poly gate that does
no yet have an electrical connection to diffusion. A
connection to diffusion is typically completed at the
top level of metal, so conductors below the top level
of metal are generally considered responsible for
damage from collecting charge during plasma
processing. Therefore, antenna area ratio design
rules are commonly used in the semiconductor
industry to ensure that the remaining charges do not
damage circuits [6].
Many companies in the industry add systematically
antenna diodes in their standard cells that are
connected on all input pins of the gates. These
antenna diodes are either connected to the supply
(P-type diode) or to the ground (N-type diode),
depending on the area cost for the cell.
In order to avoid this leakage, we can take
advantage from the cells which happen to have only
N-type antenna diodes, such as all the simple
buffers in the technology we used. The inserted cell
has to be powered off the low supply as presented
on the Figure below.
4. The signal goes from a low voltage to
a high voltage
Whenever a gate has to drive the input of another
gate operating at a higher voltage, a voltage
conversion is needed at the interface. Connecting
the low voltage signal directly to the high voltage
gate is not acceptable, even though it would be the
simplest solution. The simulation plot below shows
this situation with two inverters, the first one being
operating at a lower supply than the second one.
When a falling edge is presented at the input of the
first inverter, there is a static current consumption in
the second inverter because its PMOS is weakly
opened.
Curent in second
inverter
input
output
130 mV
50 µA
The solution we adopted is to use a dual cascode
voltage switch (DCVS), which I call a “level
shifter” in this paper. However, a usual level shifter
as presented in [3] has its output undefined
whenever the input supply is turned off. For this
reason, we have added a 2-input AND gate in order
to force the output low and a NMOS in order to cut
any current which could flow to the ground as
shown below. The NMOS and the AND gate are
controlled by a signal which is low when the input
voltage supply is switched off.
As a consequence, the
type of the diodes
appears to be random,
leading to the risk of
having a static current
from the higher to the
lower voltage flowing
through a P-type diode,
as presented on the right.
SAME 2001, November 15th
2001 5
IV. Design Flow and Libraries
The principle of the technique presented here is to
avoid the need of using analog tools and tool
environments from RTL down to the layout. CAD
tools all have to be digital and standard. In order to
stay in a pure digital environment, we had to write
all the digital libraries for the level shifters, just as
those that are used for normal standard cells:
1. Verilog (HDL description)
The verilog model of a standard level shifter is
similar to the one of a buffer. In our case, the model
we used is similar to a 2-inputs AND gate. RTL
design is performed as usual, without any reference
to the power supplies. The level shifters are
instantiated within the RTL code.
2. Design Compiler (Synthesis)
The level shifter’s timing parameters (fall/rise slew
rate and fall/rise transition delays) under all the
necessary PVT (process, voltage and temperature)
corners have been extracted from Spice simulations.
A Design Compiler .lib file has been generated and
compiled to a .db file so that the synthesis will treat
the level shifter as a standard cell.
3. Fastscan (ATPG)
A Fastscan model of the level shifters has been
generated, too, so that we can automatically
generate scan patterns for the production test.
Fastscan does not need any timing information. The
logical function is a 2-inputs AND gate, as for the
verilog. From there, Fastscan treats the level shifter
as if it was a digital cell. Running ATPG is easier
because we can read the complete design in
Fastscan, rather than generating a set of scan
vectors for each region. In addition, the fault
coverage is most probably higher.
4. Silicon Ensemble (Place&Route)
Silicon Ensemble needs 2 library files for the level
shifter. The first one is the LEF and is a view of the
layout of the cell. The second file is the TFL
(Timing Library File). It can be automatically
derived from the Design Compiler’s library using
the syn2tlf program provided by Cadence. The TLF
file is needed for CT-Gen (the Clock-Tree
Generator) in order to estimate the clock skew and
the insertion delay of the clock tree.
Silicon Ensemble generates a post-layout netlist
which includes the level shifters, and an RC file
which contains the list of all the capacitors and
resistances of the routed nets. These two files can
then be read by the delay calculator which generates
a SDF file used for the back-annotation. The delay
calculator can be Design Compiler or Primetime
from Synopsys, or any internal tool (quite often
foundries have their own golden delay calculator).
V. Automated floorplan and
placement
A small program has been developed in order to
ease the floorplan generation. Based on the number
of level shifters and the desired utilization
percentage of each voltage region, it proposes a
selection of floorplans with different aspect ratios
for which it generates Silicon Ensemble scripts that
will initialize the floorplan, place the level shifters
automatically and route the horizontal and vertical
power stripes as represented below.
Finally, the cells are gathered in groups, and each
group is assigned to a region, so that the placement
tool will locate each cell in the correct region.
VI. Clock tree synthesis
The clock tree structure with dual supply voltages
presented in [2] handles clock domains in which all
the flip-flops are only allowed to operate at the low
voltage while meeting the timing constraints.
We propose here a technique allowing a given clock
tree to drive flip-flops in both low and high voltage
regions. However, the clock tree generator is not
allowed to place clock buffers in a voltage region
which is different from the clock’s root driver.
The level shifter’s
layout has been done
in such a way that it
looks like a standard
cell’s layout except
that it is “dual-rail”
as shown on the
right.
SAME 2001, November 15th
2001 6
Effectively, we have to avoid that a clock buffer
gets placed in a voltage region that is turned off if
the corresponding branch is supposed to drive
functions that are in use (powered on). The dashed
line on the diagram below symbolizes a dead branch
of the clock tree, which makes some functions in
voltage regions 1 and 3 fail if voltage 2 is turned
off.
The correct placement of the clock tree buffers is
managed by several steps, automated in a Unix shell
script. There are as many CT-Gen runs as voltage
regions. The diagram below presents an example of
a clock tree generation in voltage region 3: all
possible “holes” in the rows of regions 1 and 2 are
filled with dummy filler cells. Then all cells in these
regions are assigned the FIXED property in the
DEF file (Silicon Ensemble ASCII database).
Finally, CT-Gen is launched.
Once all the clock trees have been generated, the
routing can be launched as for a usual layout, and
RC parasitics file can be generated as in the
standard way.
VII. Future enhancements
By the chosen flow, all voltage regions will be
back-annotated using the same PVT conditions,
because only one SDF file is generated. A region
could impose its own voltage range (best case,
worst case) to the others, even if the latter have
weaker voltage constraints. This problem could be
eliminated by splitting the RC file, generating an
SDF file for each region, and merging them together
with a simple PERL script.
VIII. References
[1] Chingwei Yeh and Yin-Shuin Kang, Cell-Based
Layout Techniques Supporting Gate-Level Voltage
Scaling for Low Power. IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, Vol. 8 No.
5, October 2000.
[2] Kimiyoshi Usami, Mitsunori Igarashi, Fumihiro
Minami, Takashi Ishikawa, Masahiro Kanazawa,
Makoto Ichida, and Kazutaka Nogami, Automated
Low-Power Technique Exploiting Multiple Supply
Voltages Applied to a Media Processor. IEEE
Journal of solid-state Circuits, Vol.33, No.3, March
1998.
[3] C.Yeh and M.-C. Chang, Gate-level voltage
scaling for low-power design using multiple supply
voltages. IEE Proc. Circuits Devices Syst., Vol.
146, No. 6, December 1999.
[4] Chigwei Yeh, Yin-Shuin Kang, Shan-Jih Shieh,
Jinn-Shyan Wang, Layout Techniques Supporting
the Use of Dual Supply Voltages for Cell-Based
Designs. Design Automation Conference, 1999.
Proceedings. 36th , 1999
[5] Yi-Jong Yeh and Sy-Yen Kuo, An Optimization-
based low-power voltage scaling technique using
multiple supply voltages. Circuits and Systems,
2001. ISCAS 2001. The 2001 IEEE International
Symposium on , Volume: 5, 2001.
[6] Martin Polzl, A Strategy to Detect Charge
Damaging Process Steps within a Multilayer
Metallization Technology. 1997 2nd International
Symposium on Plasma Process-Induced Damage.
[7] G. E. Tellez and M. Sarrafzadeh, Clock period
constrained minimal buffer insertion in clock trees.
In Proceedings of the IEEE/ACM International
Conference on Computer-Aided Design, 1994.
[8] Jatuchai Pangjun and Sachim S. Sapatnekar,
Clock Distribution Using Multiple Voltages in Low
Power Electronics and Design, 1999. Proceedings.
1999 International Symposium on , 1999.
[9] Alain Guyot and Sélim Abou-Samra, Low
Power CMOS Digital Design in ICM’98, December
14-16 1998.
[10] Anantha P. Chandrakasan, Samuel Sheng, and
Robert W. Brodersen, Low-Power CMOS Digital
Design in IEEE Journal of Solid-State Circuits. Vol.
27, No. 4, April 1992.