-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
1/310
Design and optimization techniques of
highspeed VLSI circuits
Marco Delaurenti
Politecnico di Torino
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
2/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
3/310
Design and optimization techniques of
highspeed VLSI circuits
Marco Delaurenti
PhD Dissertation
December 1999
Politecnico di Torino
Advisor
Prof. Maurizio Zamboni
Coordinator
Prof. Ivo Montrosset
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
4/310
Copyright c1999 Marco Delaurenti
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
5/310
Writing comes more eas-
ily if you have something
to say.
(Sholem Asch)
When I use a word,
Humpty Dumpty said in
rather a scornful tone, it
means just what I choose
it to meanneither more
nor less.
(Lewis Carroll)
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
6/310
Acknoledgments
First of all I would like to thank my advisor, Prof. M. Zamboni, Prof. G
Piccinini, Prof. G. Masera for their invaluable help, and Prof. P. Civera for
his being a bridge toward the real world. Also many thanks to the VLSI
LAB members at Politecnico of Turin, Italy: Mario for his input about the
critical paths (no, I do not thank you for the jazz songs that you play all
day long), Luca for the long discussions about books and movies (no, I
havent seen the last Kubricks movie), Andrea for his very good cocktails
(especially the Negroni one) and Danilo, because I forgot him every time
we went to lunch. Thanks also to Max (for he gave me the root password),
and to Yuan&Svensson for the invention of the TSPC.
Special thanks, finally, to Mg, for her support and for have been tolerating
me till now.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
7/310
CONTENTS
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Part I CMOS Logic 1
1. Introduction to CMOS logic . . . . . . . . . . . . . . . . . . . . . 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 CMOS logic families . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Static logic families . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Dynamic logic families . . . . . . . . . . . . . . . . . . 6
1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Part II Circuit Modeling 13
2. A simple model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 The Elmores model . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3. A complex model . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 The FAST model . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 MO S equations . . . . . . . . . . . . . . . . . . . . . . 23
3.1.2 Internal nodes approximation . . . . . . . . . . . . . . 24
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
8/310
viii Contents
3.1.3 Body effect . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Delay estimation . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Equation solving . . . . . . . . . . . . . . . . . . . . . 32
3.3 Power estimation . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.1 Switching energy . . . . . . . . . . . . . . . . . . . . . 36
3.3.2 Shortcircuit energy . . . . . . . . . . . . . . . . . . . 39
3.3.3 Subthreshold energy . . . . . . . . . . . . . . . . . . 39
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Part III Optimization 45
4. Mathematic Optimization . . . . . . . . . . . . . . . . . . . . . 47
4.1 Optimization theory . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.1 Mono-objective optimization . . . . . . . . . . . . . . 49
4.1.1.1 Unconstrained problem . . . . . . . . . . . . 51
4.1.1.2 Constrained problem . . . . . . . . . . . . . 52
Lagrange multiplier and Penalty functions . . 52
4.1.2 Multi-objective optimization . . . . . . . . . . . . . . 54
4.1.2.1 Unconstrained . . . . . . . . . . . . . . . . . 56
4.1.2.2 Constrained . . . . . . . . . . . . . . . . . . 57
Compromise solution . . . . . . . . . . . . . . 57
4.2 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . 58
4.2.1 One-dimensional search techniques . . . . . . . . . . 59
4.2.1.1 The section search . . . . . . . . . . . . . . . 59
Dicotomic search . . . . . . . . . . . . . . . . . 59
Fibonacci Search . . . . . . . . . . . . . . . . . 60
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
9/310
Contents ix
The golden section search . . . . . . . . . . . . 60
Convergence considerations . . . . . . . . . . . 61
4.2.1.2 Parabolic interpolation . . . . . . . . . . . . 62
The Brents rule . . . . . . . . . . . . . . . . . . 62
4.2.2 Multi-dimensional search . . . . . . . . . . . . . . . . 63
4.2.2.1 The gradient direction: steepest (maximum)
descent . . . . . . . . . . . . . . . . . . . . . 63
4.2.2.2 The optimal gradient . . . . . . . . . . . . . 65
Convergence considerations . . . . . . . . . . . 66
4.2.3 The conjugate direction method . . . . . . . . . . . . 67
4.2.3.1 The FletcherReeves conjugate gradient al-
gorithm . . . . . . . . . . . . . . . . . . . . . 68
4.2.3.2 The Powell conjugate gradient algorithm . . 69
4.2.4 The SLOP algorithm . . . . . . . . . . . . . . . . . . 70
4.2.5 The simulated-annealing algorithm . . . . . . . . . . 72
4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5. Circuit Optimization . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1 Optimization targets . . . . . . . . . . . . . . . . . . . . . . . 78
5.1.1 Circuit delay . . . . . . . . . . . . . . . . . . . . . . . . 79
Critical Paths . . . . . . . . . . . . . . . . . . . 80
5.1.1.1 Delay formula obtained by the Elmore model 84
5.1.1.2 Delay measurement obtained by the FAST
model and by HSPICE . . . . . . . . . . . . . 86
5.1.2 Power consumption . . . . . . . . . . . . . . . . . . . 87
5.1.3 Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 Optimization examples . . . . . . . . . . . . . . . . . . . . . . 91
5.2.1 Algorithm choice . . . . . . . . . . . . . . . . . . . . . 94
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
10/310
x Contents
5.2.2 Mono-objective optimizations . . . . . . . . . . . . . . 95
5.2.2.1 Area . . . . . . . . . . . . . . . . . . . . . . . 95
5.2.2.2 Power . . . . . . . . . . . . . . . . . . . . . . 96
5.2.2.3 Delay . . . . . . . . . . . . . . . . . . . . . . 97
5.2.3 Multi-objective optimizations . . . . . . . . . . . . . . 102
5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6. ACA D
tool for optimization . . . . . . . . . . . . . . . . . . . . 1076.1 Logical description . . . . . . . . . . . . . . . . . . . . . . . . 107
6.1.1 The optimization algorithm module (OA M) . . . . . . 107
6.1.2 The function evaluation module (FE M) . . . . . . . . . 109
6.1.3 Core engine . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Code implementation . . . . . . . . . . . . . . . . . . . . . . . 110
6.2.1 The classes CircuitNetlist and Circuit . . . . . . . . . 110
6.2.2 The class EvaluationAlgorithm . . . . . . . . . . . . . 112
6.2.3 The class OptimizationAlgorithm . . . . . . . . . . . 113
6.2.4 The critical path retrieving . . . . . . . . . . . . . . . 115
6.2.5 The derived classes . . . . . . . . . . . . . . . . . . . . 116
6.3 Program flows . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7. Results and conclusions . . . . . . . . . . . . . . . . . . . . . . 121
7.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.1.1 Mono-objective vs. Multiobjective . . . . . . . . . . . 122
7.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
11/310
Contents xi
Appendix 143
A. Class graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
B. Source code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
B.1 Main functions . . . . . . . . . . . . . . . . . . . . . . . . . . 149
B.2 Optimization algorithms . . . . . . . . . . . . . . . . . . . . . 208
B.3 Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
12/310
xii Contents
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
13/310
LIST OF FIGURES
1.1 Static and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Pass-transistor logic xor . . . . . . . . . . . . . . . . . . . . . 61.3 Domino typical gate . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 CVSL typical gate . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 C2MOS typical gate . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 TSPC Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 RC MO S equivalence . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 RC chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 RC single cell . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Elmore impulse response . . . . . . . . . . . . . . . . . . . . . 18
3.1 Inverter voltages waveform . . . . . . . . . . . . . . . . . . . 23
3.2 Mos chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Node voltages . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Voltages wave form in the nMO S chain . . . . . . . . . . . . 27
3.5 Voltages wave forms in the pMOS chain . . . . . . . . . . . . 28
3.6 VDS and VGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7 MOSFET chain with static voltages . . . . . . . . . . . . . . . 30
3.8 Threshold variation . . . . . . . . . . . . . . . . . . . . . . . . 31
3.9 Delay comparison . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.10 Energy comparison . . . . . . . . . . . . . . . . . . . . . . . . 43
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
14/310
xiv List of Figures
4.1 Section search . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Minimization by Powell algorithm . . . . . . . . . . . . . . . 70
4.3 Minimization by Powell algorithm . . . . . . . . . . . . . . . 71
4.4 Minimization by SLOP algorithm . . . . . . . . . . . . . . . . 72
4.5 Minimization by Simulated-annealing algorithm . . . . . . . 73
4.6 Minimization by Simulated-annealing algorithm . . . . . . . 74
5.1 Design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Delay definition . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3 Critical paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4 Critical path tree . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5 Elmore delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Elmore delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.7 HSPICE delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.8 FAST delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.9 HSPICE Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.10 CMOS Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.11 TSPC Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.12 TSPC And gates . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.13 TSPC Or gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.14 Static and-or gate . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.15 Static parity gate . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.16 Static full-adder . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.17 TSPC full-adder (onestage) . . . . . . . . . . . . . . . . . . . 101
6.1 Tool block diagram . . . . . . . . . . . . . . . . . . . . . . . . 108
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
15/310
List of Figures xv
7.1 Comparison of 0.7 m and 0.25 m. gates @ minimum tech-
nology width . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.2 Delay optimization of 0.7 m gates. . . . . . . . . . . . . . . . 125
7.3 Delay optimization of 0.25 m gates. . . . . . . . . . . . . . . 126
7.4 Technology comparison of delay optimization. . . . . . . . . 127
7.5 Several delaypower optimization policies of 0.7 m gates. . 132
7.6 Energy-dissipation variation (zoom of figure 7.5(b)) . . . . . 133
7.7 Several delaypower optimization policies of 0.25 m gates. 134
7.8 Energy-dissipation variation (zoom of figure 7.7(b)) . . . . . 135
7.9 Delaypower optimization (50%50%) comparison of 0.7 m
and 0.25 m gates. . . . . . . . . . . . . . . . . . . . . . . . . 136
7.10 Delay and power trajectory during 4 different multi-objective
optimizations for the andor gate . . . . . . . . . . . . . . . . 137
7.11 Delay and power trajectory during 4 different multi-objective
optimizations for the parity gate . . . . . . . . . . . . . . . . 138
7.12 Delay and power trajectory during 4 different multi-objectiveoptimizations for the static full-adder . . . . . . . . . . . . . 139
7.13 Delay and power trajectory during 4 different multi-objective
optimizations for the dynamic full-adder . . . . . . . . . . . 140
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
16/310
xvi List of Figures
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
17/310
LIST OF TABLES
3.1 Mean Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Execution time . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1 Optimization algorithms . . . . . . . . . . . . . . . . . . . . . 75
5.1 Basic gates: complexity . . . . . . . . . . . . . . . . . . . . . . 92
5.2 Basic gates: pre-optimization delay, power consumption and
area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3 Full-adder: delay optimization . . . . . . . . . . . . . . . . . 99
5.4 Agreements of targets . . . . . . . . . . . . . . . . . . . . . . 103
5.5 Full-adder: delay and power optimization . . . . . . . . . . 105
5.6 Full-adder: optimizations comparison . . . . . . . . . . . . . 105
7.1 Library gates list . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.2 Delay and energy dissipation @ minimum width (HSPICE) . 123
7.3 Delay decreasing and energy increasing (both relative) in a
delay optimization. . . . . . . . . . . . . . . . . . . . . . . . . 128
7.4 Elapsed time and total number of function evaluations for a
full-delay optimization with HSPICE on a ULTRA-sparc 5 129
7.5 Constrained delay optimization of a few 0.25 m gates. . . . 130
7.6 Delay worsening and energy improvement between a full
delay optimization and delay-power optimization . . . . . . 133
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
18/310
xviii List of Tables
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
19/310
Preface
The design of high speed integrated circuit is a long and complex op-
eration; nonetheless the total timetomarket required from the idea to the
silicon masks is reducing along the way.
To help the designer during this long and winding road several CAD tools
are available. In the first step the only thing existing is the description of
the circuit behaviour (the idea); in the central step of the design flow the
designer knows only the logic functioning of each block composing the cir-
cuit, but he ignores the technology realization of these blocks; in the last
steps, finally, the designer knows exactly the technology implementation
of every single gate of the circuit, and can compose the final layout with
every gate. Ca va sans dire that the CAD tool are nowadays of vital import-
ance in the design flow, and moreover the goodness or the badness of such
tools influence a lot the quality of the final design.
Among all the possible instruments, the optimization tools have a pri-
mary role in all the phases of a project, starting from the optimization at
higher level and descending to the optimization made at the electrical level.
This thesis focuses its efforts in developing new strategies and new
techniques for the optimization made at the transistor dimension level, that
is the one done by the cell library engineer, and developing also a CAD in-
strument to make this work as more as harmless as possible.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
20/310
xx Preface
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
21/310
Part I
CMOS LOGIC
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
22/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
23/310
Chapter 1
INTRODUCTION TO CMOS LOGIC
THE optimization of VLSI circuits involves the optimization of single
CMOS cell. In this chapter are briefly reported the basic CMOS logic
families, with their pros and cons. The simple goal is to pick up among
the static and dynamic logic families the most appealing for the use in vlsi
circuits, and, in some measure, the most actually used, and then apply to
them the optimization techniques shown in the next chapters.
1.1 Introduction
We might ask: why to optimize a single cell in VLSI circuit, when the
design nowadays is shifting toward higher and higher level?
Some answers could be:
Need of re-usable library cells. This makes easier to reuse the samelibrary for different projects. It is a must nowadays, in order to reduce
the total time to target/market.
An optimized library makes easier the design at higher level: floor-planning, routing, can have relaxed constraints, since the gates have
a better behaviour. It is possible to reduce the time to repeat some
critical steps like floorplanning or routing until all the specifications
are met: these specifications are met earlier, since the cell globally
have a better behaviour.
Need of having some equivalent libraries with different kind of op-timization. It is possible to have different libraries that have different
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
24/310
4 Chapter 1. Introduction to CMOS logic
specifications, but are functionally equivalent, so that it is possible to
create different version of a project simply substituting the basic lib-rary. It would be possible, for example, to have, of the same project, a
version that runs at full speed, and version optimized for low-power
dissipation.
This swapping of libraries does not involve the higher levels of design,
for it is totally transparent to the designer during floorplanning or
routing. Just before the layout production, during the cell mapping,
it is possible to choose the library on to which the project would be
mapped.
These answer have led to consider the appropriateness of the produc-
tion of a tool able to perform the optimization of a cell library, in a way
appropriate for the designer. The goal is to produce some results to show
that this optimization is worth during a design cycle, and also to make the
insertion of the tool in a design cycle as smooth as possible.
In order to attain results that are related to a real production cycle, we
have to choose some cells that are almost present in a real library.
For this purpose we introduce a very brief description of the most used
CMOS logic families, and among them we choose the cells to develop and
test the optimization framework.
1.2 CMOS logic families
The first basic distinction inside the CMOS logic families is among the
static logics and the dynamic logics ([1]).
Static logic: The static logic is a logic in which the functioning of the cir-cuit is not synchronized by a global signal, namely the clock of the
circuit. The output is solely function of the input of the circuit, and
it is asynchronous with respect to them. The timing of the circuit is
defined exclusively by its internal delay.
Dynamic logic: The dynamic logic is a logic in which the output is syn-
chronized by a global signal, viz. the clock. The output is, then, func-
tion both of the inputs of the circuit and of the clock signal; and the
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
25/310
1.2. CMOS logic families 5
timing of the circuit is defined both by its internal delay and by the
timing of the clock.
Both the static and dynamic logics comprehend several logic families.
1.2.1 Static logic families
The principal static families are:
Conventional static logic It is the logic normally referred when speakingofstatic logic. A static circuit has the same number ofNMOS and PMOS
transistors, but the n and p branches are respectively one the dual
of the other. As an example see figure 1.1, which represents a static
A
B
OUT = A and B
Fig. 1.1: Static and
and gate. It has two NMOS transistor connected in series and two
PMOS connected in parallel.
The static logic is quite fast, does not dissipate power in steady state
and has a very good noise margin.
Pseudo-NMOS It is an evolution of the yet surpassed NMOS logic. It is ob-
tained by substituting the whole PMOS branch in a static logic with
a single PMOS transistor with its gate connected to ground. So this
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
26/310
6 Chapter 1. Introduction to CMOS logic
PMOS is always conducting and leads the output node to the high
state. When the NMOS branch conducts also, then the output dis-charges, if the ratio among the NMOS and PMOS transistor is well de-
signed.
This logic is cited here only for historical reason, since it is not so fast,
it dissipates static power in a steady state (when the output is in the
low state) and it is sensible to noise.
Pass-logic The pass-logic is relatively new logic, and, for many digital de-
signs, implementation in pass-transistor logic (PTL) has been shown
to be superior in terms of area, timing, and power characteristics tostatic CMOS.
As an example see figure 1.2,
A
A
B
B
OUT = A xor B
Fig. 1.2: Pass transistor logic xor
1.2.2 Dynamic logic families
The principal dynamic families have a characteristic in common: every
dynamic logic needs of a pre-charge (or pre-discharge) transistor to lead to
a known state some pre-charged nodes. This is done during the working
phase known as pre-charge phase or memory phase; during another working
phase, the evaluation phase the output has a stable value1.
1 This brief introduction is limited to systems that have a single global clock, or onephase, intending here the word phase as synonym of clock, and not as above as a synonymof working period. There are systems that have two, or even four phase, but they are notintroduced here. The basic functioning, however, remains the same.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
27/310
1.2. CMOS logic families 7
The principal dynamic logics are divided yet in two sub-families, pipe-
lined and not-pipelined. The first two these are non-pipelined, while the oth-ers are pipelined:
Domino logic and NP Domino logic The typical domino gate is depicted
in figure 1.3
NMOS Block
CLOCK
OUT
INPUTs
Fig. 1.3: Domino typical gate
During the pre-charge phase the clock is at its low state, so that the
pre-charged node before the static inverter is high, and the output is
low. During the evaluation phase the clock is high, so that the inputs
of the nblock (that can perform any logical function) can discharge
the pre-charged node and lead the output to the high state.
We can cascade several of these gates, given that each gate has its
own output inverter, and we can drive every gate with the same clock
signal, given that the evaluation phase lasts the time necessary to all
the gates to finish their inputs evaluation. This last fact explains why
this is a non-pipelined logic: the output of every cell is available when
the cell has finished its evaluation phase.
Moreover this logic has a limited area occupancy, since it has a low
number of PMOS transistors. On the other hand it is not possible to
implement inverting-structure and, as all the other dynamic logics,
this logic is subject to the charge-sharing problem2.
2 The charge-sharing problem, or charge-redistribution, is a problem that affects the dy-
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
28/310
8 Chapter 1. Introduction to CMOS logic
A natural evolution of the domino logic is the N-P domino logic, or
zipper logic. It consist of two typical cells, the one depicted in fig-ure 1.3, and the dual one obtained by that, simply swapping the n-
block with a p-block, and a PMOS pre-charge transistor with a NMOS
pre-discharge transistor, driven by the negated clock.
This logic has a lower are occupancy, since there is no need of a static
inverter, but has also a lower speed, given by the presence of PMOS
transistors.
Cascode voltage switch logic (CVSL) The CVSL is part of the large family
ofdifferential logics. It needs both the inputs and the inputs negated,and two complementary n-block that perform the logic function, as it
is possible to see in figure 1.4.
OUTOUT
IN
PUTs
IN
PUTs
Fig. 1.4: CVSL typical gate
It has the advantage to be quite fast, since the positive feed-back of
the two PMOS accelerates the switching of the gate, and also it has
very good noise margins. Moreover it produces both the outputs and
namic logics. Basically the charge stored in an precharged node node during the memoryphase does not remain fully stored in it. Lets think to a domino gate during the pre-chargephase, when the clock is low. If there is one input in the n-block that is high, then its cor-responding transistor is conducting. The n-branch is still not conducting, since the clockedNMOS transistor is not conducting, but some charge from the precharged node can flow toothers node via the conducting transistors in the n-block. This redistribution of charge issimply a charge of a cap8citor partition and lead to a state of the precharged node lesserthan the high state.
This problem can produce logic errors, and surely diminishes the noise margins of
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
29/310
1.2. CMOS logic families 9
negated outputs without needing an inverter. As a drawback, it has
a large area occupancy.
C2MO S logic The typical C2MO S gate is shown in figure 1.5. It is basically
a three-state gate, since when the clock is at the low state, the output
is floating at the high impedance state.
NMOS Block
INP
UTs
CLOCK
PMOS Block
INPUTs
CLOCK
OUT
Fig. 1.5: C2MO S typical gate
It is principally used as a dynamic latch, as an interface among static
logics and dynamic-pipelined logics.
NO RAce logic (NORA) The NORA logic, as acronym of no race, is an evol-ution of the N-P domino logic. The static inverter of the domino logic
is substituted with a C2MO S inverter. This is the first of the pipelined
logics, since the output of every gates is available only when the clock
switch its state, and not before.
Since the output stage of every cell is also dynamic (a C2MO S in-
verter), then this logic is more subject to the charge-sharing problem
that the domino logic is.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
30/310
10 Chapter 1. Introduction to CMOS logic
True Single Phase Clock logic (TSPC) The final evolution of the NORA is
the TSPC logic, or true single phase clock logic ([2]).The TSPC logic is a n-p logic, since of each gate exists the n-version
and the p-version. For example the n-latch and the p-latch are shown
in figure 1.6.
OUT
A
CLK
(a) Type n
CLK
A
OUT
(b) Type p
Fig. 1.6: TSPC Latches
The ultimate advantage of the TSPC logic is the presence of a single
clock, since for its internal structure it is not necessary the presence of
the clock negated.
The TSPC logic is among the faster dynamic families, and surely it has
a great appealing for its very low number of transistor employed.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
31/310
1.3. Conclusion 11
1.3 Conclusion
After this very brief introduction to several CMOS families, we chose
two different logics, in order to apply the study of the optimization tech-
niques objects of this thesis. The criteria that drove us in choosing these
families was both the diffusion in VLSI circuits, and the presence of very
good qualities, perhaps not yet fully exploited in the real production of
circuits.
For these reasons we have chosen to include in our library a few static
gates (an and gate, an or gate, and a few more) and a few dynamic
gates, and in particular gates from the TSPC family. This family has shown
good characteristics in term of speed, area occupancy and power dissipa-
tion; it has also the very important feature to need only a single clock.
The complete list of the gates comprising the library can be found in the
table 7.1 (page 122), with their relative schematic diagram of CMOS imple-
mentation.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
32/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
33/310
Part II
CIRCUIT MODELING
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
34/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
35/310
Chapter 2
A SIMPLE MODEL
THE first model applied in the calculus of the delay in MO S circuits is
the Elmores model ([3]). It is a simple RC delay model, and it is the
basement of a switch MO S model (figure 2.1): the generic MOS is represen-
ted, during the ON state, by its dynamic resistance across the drain pin and
the source pin, and the parasitic capacitances and resistances at the drain
and source pins.
G
D
S
ON= G
C
C
D
S
CG
Rd
Rg
S
D
RL
CL
R0
Fig. 2.1: RC MOS equivalence
If this simple MO S model is valid, then the Elmores delay formula can
be used in every structure containing some MO S. The Elmores formula is
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
36/310
16 Chapter 2. A simple model
appealing for its simplicity and its easy of use; however the accuracy of the
formula can worsen in the deep submicron domain, since the modeling ofa MO S through its resistance it is no more valid.
Since the use of Elmores model is almost quite limited to comparis-
ons with other models, of for introduction to delay modelling, section 2.1
presents here only the very basic of the Elmores model and section 2.2
shows the conclusions about the use of this model for VLSI models.
2.1 The Elmores model
The Elmores model or the Elmores delay formula can predict the delay
of a RC chain as shown in figure 2.2.
R RRi-1 i i+1
C C Ci-1 i+1i
Vi-1 Vi Vi+1V0
Fig. 2.2: RC chain
In order to obtain the formula, lets start with a single RC cell, as shown
in figure 2.3. We can express the voltage V1(t) by means of a differential
equation such as:
C0dV1
dt=
V1(t) V0(t)R0
(2.1)
Integrating the equation (2.1), we can write
V1 = V0(t)
1 e tR0C0
.
The time constant is = R0C0, and with t = we obtain:
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
37/310
2.1. The Elmores model 17
R
C
V0
0
0
V1
Fig. 2.3: RC single cell
V1 = 0.63V0(t).
So the time tD = represents the 63% delay from V0(t) to V1(t). Extend-
ing the formula of the time constant to the chain of figure 2.2, we obtain:
tD=
N
i=0 ij=0
RjC
i.
This delay is the inputoutput delay. When there is the need to know
the delay between the input and one of the inner nodes, a more complex
formula (a semi-empirical one) can be used; for example, with N= 2:
t1 = R0C0+ qR1C1 delay from the input note to the first node
t2 = R0C0+ (R0+ R1)C1 delay from the input note to the output node
where q is:
q =
R0R0+ R1
ifR1 2R0,R0C0
R0C0+ R1C1ifR1 > 2R0.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
38/310
18 Chapter 2. A simple model
The first case (with R1 2R0) is named strong coupling, while the secondone is named weak coupling.
Given the unit impulse response h(t) (figure 2.4) of the output node of
the RC tree, Elmore proposed to approximate the delay by the mean of
h(t), considering h(t) as a distribution. The 50% delay is given by:
h(
t)
t
m
Fig. 2.4: Elmore impulse response
Z
0h(t)dt = 0.5
while the original work of Elmore proposed:
tD = m =Z
0t h(t)dt
with
Z
0h(t)dt = 1.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
39/310
2.2. Conclusions 19
This approximation is valid only when h(t) is a symmetrical distribu-
tion, as in figure 2.4, while in real cases the h(t) distribution is asymmetrical;however in [4] is proved that the Elmore approximation is an upper bound
for the 50% delay, even when the impulse response is not symmetrical, and,
furthermore, the real delay asymptotically approaches the Elmore bound as
the input signal rise (or fall) time increases.
2.2 Conclusions
The model shown in this chapter is quite appealing for the calculus ofthe delay in CMOS structure, but it is inaccurate as far as we go into the
submicron domain, so its use should be limited to a first validation of an
optimization algorithm, but not for real production.
About this, it is important to note that the delay functions obtained by the
Elmores formula satisfy some properties useful in the optimization realm
(for example equation (4.1), page 50): then the Elmore model is very useful
for optimization algorithms testing.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
40/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
41/310
Chapter 3
A COMPLEX MODEL
THE target of the model developed here is to offer limited estimation
errors with respect to physical SPICE simulations and to improve the
computation speed of more than one order of magnitude. This could be
useful in optimization algorithms.
Thus the aim of the model is to evaluate the delay and power dissipation
ofCMOS structures.
Several approaches have been used to evaluate the delays of CMOS
structures: some models are derived from SPICE simulations by means of
lookuptables [5]; some are analytical [6] while others approximate the
evaluation of the delay with step or ramp inputs [7, 8, 9, 10, 11].
Regarding the power consumption the main contributions are: switch-
ing power, short circuit current and subthreshold conduction. The first
one occurs during the charge and discharge of internal capacitances; short
circuit current originates from the simultaneous conduction ofp and n net-
works and it is dominated by the slope of node voltages; subthresholdcurrents are due to the weak inversion conduction ofMOSFETs and become
relevant when the power supply is scaled in sub-micron technologies.
Most of the proposed power models use estimation algorithms not com-
patible with the delay analysis. The purpose of the FAST model is to com-
bine delay and power evaluations in the same estimation procedure, allow-
ing the simultaneous optimization of delay and power.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
42/310
22 Chapter 3. A complex model
The section 3.1 reports the theory behind the FAST model, and in par-
ticular: 3.1.1 shows the MO S equations used in the model, 3.1.2 showsthe internal nodes voltage approximation made by the model and 3.1.3explains how the threshold voltage variation are taken into account in the
model. Section 3.2 shows how the FAST model estimates the delay, and in
particular 3.2.1 shows how the equation are solved; while section 3.3 re-ports the method used for the calculation of the power consumption, and
in particular 3.3.1 accounts for the switching power, 3.3.2 accounts for theshort-circuit power, and 3.3.3 accounts for the subthreshold power.Finally the section 3.4 presents some results by the comparison of the model
with HSPICE and the section 3.5 draws some conclusions.
3.1 The FAST model
The low complexity and the accuracy that can be obtained by taking
care of the phenomenon of carriers velocity saturation, which is domin-
ant in submicron technologies, suggested the use of the classical charge
control analysis and the gradualchannel approximation (Hodges model),
described in 3.1.1.
Estimation accuracy and low computational effort can be achieved by
operating both on the waveforms of internal signals and on the topology
considerations: in particular all the waveforms in the circuit are approxim-
ated with linear ramps.
By approximating the input waveform with a ramp, a strong simplific-
ation of the I(V) equations is obtained. Figure 3.1 shows the output voltage
of an inverter driven by a ramp input. It can be noticed that a ramp can
properly approximate the output voltage variation, especially in the central
phases of the commutation. The increasing error on the tail of the switching
does not affect significatively the delay and power estimation.
The voltage ramp approximation are described in 3.1.2.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
43/310
3.1. The FAST model 23
0
1
2
3
4
5
1.2 1.25 1.3 1.35 1.4 1.45 1.5
V
Time (ns)
VoutVinModel
Fig. 3.1: Inverter voltages waveform
3.1.1 MO S equations
The well known equations for the MOS transistors are (for the ntype
and ptype transistors)[1]:
below saturation
IDSn,p = n,p
(VGS VTn,p )VDS
V2DS2
(3.1)
above saturation
IDSn,p =n,p
2
VDSsatn,p
2(3.2)
where n,p =n,pCox W
L , with n,p modified by the carrier velocity saturation
effect:
n =n0
1+ VDSLEc
p =p0
1 VDSLEc
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
44/310
24 Chapter 3. A complex model
The saturation voltage (drainsource), not including the carrier velocity
saturation effect, is given by the well known formula:
VDSn,p = VGSn,p VTn,p
while considering the effect abovementioned:
VDSn,p =Vc
1 2(VGSn,p VTn,p )
Vc 1
(3.3)
where the plus signs are for nMOSFETs and the minus signs are for the
pMOSFETs, and Vc = |EcL|
3.1.2 Internal nodes approximation
Fig. 3.2: Mos chain with proper numbering
Let be N the number of nMOSFETs in the nchain and P as the num-
ber of pMOSFETs in the pchain, and lets label the transistor in the chain
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
45/310
3.1. The FAST model 25
from 1 to Nor from 1 to P (figure 3.2). Lets assume that the label 1 comes
with the driving transistor (i.e. the nMOSFET with source connected to VSS
as the pMOSFET with source connected to VDD), as in figure 3.2. This hy-
pothesis is only for the develop of the discussion; in our model any (but
only one) transistor can be a driving transistor, that is a transistor with a
changing gate voltage.
Notation 3.1. In the following equations the superscript index refers to the
node number (with the variable i always for the nMOSFETs and j always
for the pMOSFETs), and the smallletter subscript indexes n and p refer, re-
spectively, to nMOSFETs and pMOSFETs, both for the voltage variables or
for the time variables; for the voltage variables the capital subscript indexes
G and D refer to the drain node and the gate node, while the smallletter
index d refers to the initial conditions of the drain nodes.
So, for example, ViGn (t) is the gate voltage at the node i for the nMOSFETs
(function of time), and Vjdp
is the initial condition of the drain voltage at
node j for the pMOSFETs.
The wave forms of the voltage are shown in figure 3.4 and figure 3.5,
with the hypothesis t10n= t2
0n=
= tN
0nand t1
0p= t2
0p=
= tP
0p; that is
because we suppose the start of conduction of all the MOSFETs in a chain
contemporary1.
We can write, referring to figures 3.4, 3.5:
V1Gn (t) =
0 t < 0VDD
1int 0 t < 1in
VDD 1in t
(3.4a)
V1Gp (t) =
VDD t < 0
VDD VDD1ip
t 0 t < 1ip0 1ip t
(3.4b)
ViGn (t)
i=2,3,...,N= VDD t (3.4c)
1 This hypothesis is well supported by simulations
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
46/310
26 Chapter 3. A complex model
VjGp
(t)j=2,3,...,P = VSS t (3.4d)
ViDn (t)
i=1,2,...,N=
Vidn t < ti0n
Vidn Vidnt ti0n
ion ti0nti0n t < ion
VSS ion t
(3.4e)
VjDp
(t)j=1,2,...,P
=
Vjdp
t < tj0pVDD Vjdp
jop tj0p
t+jop V
jdp
tj0p VDD
jop tj0p
tj0p
t < jop
VDD jop t
(3.4f)
Fig. 3.3: The ith and i+1th MOSFETs with node voltages
It is also possible to define iin,p = i1on,p and the source voltage V
is =V
i+1d ,
as shown in figure 3.3 for the ith nMOS . The same is valid for the p
MOSFETs.
The starting level Vdn,p are determined with a static analysis, described
in 3.1.3.
3.1.3 Body effect: threshold variation and its approximation
It is known that a MOS transistor with the sourcebody voltage differ-
ent from zero has the threshold voltage modified by the body effect, that
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
47/310
ooo i
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
48/310
28 Chapter 3. A complex model
ooooi
Fig. 3.5: Voltages wave forms in the pmos chain
The source potential of the top transistor is
Vs = VDD VTn ,
and, ifVTn0 is the threshold voltage with Vsb = 0, then VTn = VTn0+ VTn
and we can solve for Vsb:
Vsb =
4
2|p|+ 8|p|+ 4VDD 4VTn0+ 22
+ 2|p|+VDD VTn0+2
2
(> 0)
We can find an analogue equation for pMOSFETs: knowing that, for
the pMO S chain depicted in figure 3.7(b), the drain potential of transistor
is VPdp = 0, while VPsp =VDD VTp; for the middle transistors Vjdp = V
jsp =
VDD VTp ; and for the first (topMO St) transistor V1dp =VDD VTp andV1sp = VDD .
The threshold voltage variation function ofVsb again is:
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
49/310
3.1. The FAST model 29
oo
i
Fig. 3.6: Drainsource (VDS) and gatesource (VGS) voltages of th ith nMOS
VTp
=(2|p|+Vsb 2|p|)
(for pMO S transistors threshold voltage is negative).
Again, solving:
Vsb = VDD VTp = VDD VTp0+ (
2|p|+Vsb
2|p|)
where VTp0 is the threshold voltage with Vsb = VDD ; thus we find:
Vsb =
4
2|p|+ 8|p|+ 4VDD + 4VTp0+ 22
2|p| VDD VTp0 2
2(< 0)
The threshold variation is approximated in the model by a linear ap-
proximation given by:
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
50/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
51/310
3.2. Delay estimation 31
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
0 1 2 3 4 5
VTn
Vsb
VTn
(Vsb
)VTn approx
(a) nMOSFET
-1.7
-1.6
-1.5-1.4-1.3
-1.2
-1.1
-1
0 1 2 3 4 5
VTp
Vsb
VTp
(Vsb
)VTp approx
(b) pMOSFET
Fig. 3.8: Threshold variation with Vsb (solid line) and its linear approxima-tion (dashed line)
In figure 3.8(a) and 3.8(b) the actual threshold variation (of a nMO S
transistor and a pMO S transistor) when a Vsb voltage is applied is com-
pared with the linear approximation used in our model, for a 0.7 m tech-
nology.
The max error due to the linear approximation is limited to 7%.
3.2 Delay estimation
The delay estimation of the structures reported in figure 3.2 implies the
evaluation ofion,p and ti0n,p
, for each transistor in the chains.
The currents in each transistor can be obtained from equations (3.1),
(3.2) (page 23), with the voltage function of time defined in equations (3.4a)
(3.4f) (page 25). So we can calculate the quantity of charge at each node and
thus apply the charge conservation law, i.e. at each node the total chargevariation must be equal to zero:
Qin = 0 Qjp = 0 i = 1, 2, . . . Nand j = 1, 2, . . . , P (3.5)
The generic term Qin is the sum of three elements, Qin = Q
i+1I QiI QiC,
define below:
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
52/310
32 Chapter 3. A complex model
Qi+1I is the charge due to the (i+ 1)th MOSFET placed above the ithnode:
Qi+1I =Z ti+1sn
ti+10n
Ii+1sat (t)dt+Z i+1on
ti+1sn
Ii+1lin (t)dt (3.6a)
which includes the contributions due to the currents above and be-
low saturation; ts is the time at which the MOSFET switches from the
saturation to the linear region;
QiI is the charge due to the (i)th mos below the ith node:
QiI=Z tisn
ti0n
Iisat(t)dt+Z ion
tisn
Iilin (t)dt (3.6b)
QiC is the charge due to the discharging of the capacitor at the ithnode, Ci:
QiC= CiVi
dn. (3.6c)
Similarly equations apply for pMOSFET.
For each circuit node, a charge conservation equation can be written.
3.2.1 Equation solving
Referring to the nMOS chain in figure 3.3, we can write at the output
node N:
QNn = QNC = CNVNdn (3.7)
because, neglecting the contribution of the pMOS chain above (if it exists),
QNI = 0.
At the node N 1 we can write:
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
53/310
3.2. Delay estimation 33
QN1n = QNI QN1I QN1C ,
and combining with eq. (3.7) (page 32)
QN1n = CNVNdn QN1I QN1C ,
and so on:
QN2n =CNVNdn CNVN1dn QN1I QN2C .
More generally:
Qin =N
k=i+1
CkVkdn QiI QiC
=N
k=i
CkVkdn QiI= 0
Proceeding till the first transistor, we obtain:
Q1n =N
k=1
CkVkdn Q1I= 0 , (3.8)
the same applies for pMOSFETs.
In order to solve nonlinear equation (3.8) one must substitute the defin-ition of the current to calculate the charge Q, as in equations (3.6a), (3.6b)
(page 32), moreover one must substitute both the current calculated in the
saturation region and the one calculated in the linear region, extending the
integrals of the aforementioned equations to the proper extremes.
Finally we must distinguish among several different cases, depending
on the instant of time on which the transistor switch from the saturation
region to the linear region. For example, the first transistor can switches
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
54/310
34 Chapter 3. A complex model
between the two regions when the rising of the input has already finished,
or on the contrary can switches when the input is still rising.All the possible cases are:
t10 t1s
1i
1o t
10
1i t
1s
1o
t1s t10
1i
1o
1i t
10 t
1s
1o
t1s 1i t
10
1o t
10 t
1s
1o
1i
t1s t10
1o
1i
(3.9)
Evaluating all the possible cases, the equation (3.8) becomes a non
linear equation of the variables t1s , t10,
1o ,
1i , with t
1s , t
10,
1o as unknowns.
A further step must be done, with the purpose of eliminating all the vari-
ables but one. The real unknown is the time 1o , while all the other un-
knowns can be expressed in function of1o : in particular, the times t1s and
t10 can be calculated together, with the equation VDS = VGS VTand withthe equation that states the charge conservation at node 1 between the time
0 and the time t10, similar to the equation (3.5) (page 31), including the boot-
strap effect due to capacitive coupling between the gate and the drain of
the first transistor.
Both these equations are functions of t1s , t10,
1o ,
1i . By this way one has
three equations with three unknowns, and by means of some approxim-
ated methods2 it is possible to evaluate the three unknowns.
This solution scheme ought to be repeated for all the seven cases shown
in equation (3.9). Each case gives as a solution a triple t1s , t10,
1o that is com-
patible with one and only one of the conditions expressed by these cases.
Thus, only one working condition is really selected, as it can be expected.
Indeed all the previous solving scheme is true only if the equation (3.6c)
(page 32) apply, i.e. only if the capacitance at the node i is not a function of
the voltage at the same node. But the capacitance actually is function of the
voltage in this manner:
Or, taking into account the carrier velocity saturation effect, the equation (3.3) (page 24).2 The problem is always strictly nonlinear.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
55/310
3.2. Delay estimation 35
Ci = Cij
1+
Vi
b
mj+Cip
1+
Vi
b
mp(3.10)
where Cj and Cp are, respectively, function of area and function of peri-
meter of a junction, because the capacitance at the node i is due to the para-
sitics capacitances of the transistors connected to this node.
If the capacitance at each node are functions of the voltage at the node it-
self, then one equation is no more sufficient: one must write equations like
the equation (3.8) (page 33), one for each node, and the solve them with
standard solving algorithm for nonlinear equations. The only difference
among the equations applied at the nodes above the first and the first node
equation is that not all of the cases of equation (3.9) are possible: in par-
ticular these conditions apply only when the transistor can pass from the
saturation region to the linear region, and moreover, only when the input
rising time 1i can assume whichever value. The passage from saturation to
linearity can be made only by the first and the last transistors of the chain,
as they are the only that can saturate3. But in the last transistor, the time Niis governed by Ni =
N1o , giving thus only two possible cases:
tN0 tNs
Ni
No t0
Ni t
Ns
No
In order to make the algorithm convergent, two other fictitious cases
must be included:
tN0 tNs , No Ni
t0 tNs ,
No
Ni
These conditions can never verify in a real circuit, since they imply that
the voltages at the source node and at the drain node of the last transistor
3 This is because they are the only that have a full voltage swing at some node, e.g. thegate node the first, and the drain the last. All the transistor in the middle of the chainare prevented to saturate by the body-effect, that makes the saturation condition VDS =VGS VT, (or, better, the equation (3.3), page 24) impossible.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
56/310
36 Chapter 3. A complex model
crosses, making the transistor current flowing in an inverse direction (see
figure 3.6 for a visual explanation of the terms i and o and why they relat-ive voltage waveforms cannot cross). Their inclusion help finding the real
circuit conditions when solving the equation (3.8) for each of these four
cases: the solution of one the fictitious cases gives only unknowns compat-
ible with one of the real cases.
All the other transistors, that can not saturate during the switching from
off to on, have only one possible working condition, again that the voltages
at source and drain nodes do not cross:
ji jo j = 2, . . . N 1
Solving all the equations, one for each node, the unknowns jo can be
evaluated, giving thus an estimate of the voltage waveform at each node
of the chain. The rising/falling time of the last node of the chain gives also
the delay of the chain itself.
3.3 Power consumption estimation
3.3.1 Switching energy
The contribution to the power dissipation due to the charge and dis-
charge of internal nodes for each MOSFET can be defined as the integral of
the voltage across the MOSFET times the current flowing through.
Theorem 3.2. The switching energy in generic nnetworks and pnetworks can
be written as:
Eswn =1
2
N
i=1
Ci
V 2i V 2i
(3.11)
Eswp =1
2
P
j=1
Cj
VDD Vj
2 VDD Vj 2
(3.12)
where Ci is the generic total capacitance of node i-th and Vi , Vi are, re-
spectively, the initial and final value of the voltage swing at the same node.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
57/310
3.3. Power estimation 37
Corollary 3.2.1. If the voltage swing of each node of the network is the full swing
V= VDD 0, then equations (3.11), (3.12) can be written as:
Eswn =1
2
N
i=1
CiV2 (3.13)
Eswp =1
2
P
i=1
CiV2 (3.14)
Proof of theorem 3.2. Since the internal voltages and currents are known from
the delay analysis, the energy for the nMO S network can be written by
summing all the contributions of internal nodes (see figure 3.3)
Eswn =N
i=1
Z Vi+1Dn (t) ViDn (t)
IiDn (t)dt
where the notation of figure 3.3 is adopted.
This equation can be written in this way:
Eswn =Z
VNDn (t)I
NDn (t)+
N1i=1
ViDn (t)
IiDn (t) Ii+1Dn (t)
dt (3.15)
It is possible to rewrite the previous equations by noting that in general:
Ii+1Dn
IiDn = C
idViDn
dt
and, in particular, if we neglect the current of the pMO S chain above the
node N,
INDn = CNdVNDn
dt.
Thus, for the n network it is possible to define the Eswn energy in the
following way:
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
58/310
38 Chapter 3. A complex model
Eswn = N
i=1
CiZ t0
t0ViDn
dViDndt
dt
= N
i=1
CiZ Vi
ViViDn dV
iDn
=1
2
N
i=1
Ci
V 2i V 2i
If we integrate the equation (3.11) (page 36) only when the argument of
the integrals are non zero, then the first integral in this equation goes fromt0 = t
i0n
to t0 = ion , so that the second integral goes from V
i = V
iDn
(ti0n ) to
Vi = ViDn
(ion ). Since ViDn
(ion ) = 0, we have Eswn =12
Ni=1 C
iV 2i , where Vi
is the actual voltage swing at the node i.
The energy dissipated in the p network (Eswp ) can be calculated with
similar considerations leading to
Eswp =P
j=1
CjZ t0
t0
VDD VjDp
dV
i
Dndt
dt
=
P
j=1
CjZ Vj
Vj
VDD VjDp
dV
jDp
=1
2j
Cj
VDD Vj
2 VDD Vj 2
Again, Vj = VjDp
(ti0n ) and V
j = VjDp
(jop ), and in the same way V
j =
VDD
, so that Eswp=
1
2
P
j=1Cj(V
DD V
2
j), where (V
DD V
2
j) is the voltage
swing at the node j.
In the equations (3.11) and (3.12) (page 36) the voltage variation of ca-
pacitance must be included, obtaining expression for Eswn,p slightly more
complicated, but still in closed form.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
59/310
3.3. Power estimation 39
3.3.2 Shortcircuit energy
The shortcircuit contribution (for a output falling transition) is given
by:
Esc =Z o
t0VD ID dt
where ID is the pMOSFET current flowing through the pMOSFET that
has a changing gate voltage, during the output falling; of course all the
pMOSFETs among this one and the output node must be on to have this
contribution of power dissipation. So if we neglect the little discharging of
the source voltage of this MOSFET, we can easily calculate the shortcircuit
energy, calculating the current flowing.
A similar equation can be written for the nMO S network.
Since voltage swings, internal currents and capacitances are known from
the delay analysis, the power supply dissipation does not require addi-
tional computations.
3.3.3 Subthreshold energy
The subthreshold current in a MOSFET is given by ([12]):
IDSsubth = 0W
L
kT
qQ(VS)
1 e
qVDSkT
where
Q(VS) kTq
qsNa|p| e
q(VGVT)kT
and
= 1+1
2Cox
s Na|p| .
This current is proportional to the MOSFET width W, but, usually is neg-
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
60/310
40 Chapter 3. A complex model
ligible. However, with the scaling down of the dimensions and hence of the
threshold voltage this current may become no more negligible, and withlow VG and higher VD, the current becomes independent from VG.
Moreover, while the shortcircuit current is limited by the switching times
of the circuit, the subthreshold current is not limited in time, so its dissip-
ation can be comparable to the shortcircuit dissipation.
3.4 Results
The circuit in figure 3.2 with 2 nMO S and 2 pMOS transistors (in a
0.7 m technology) has been simulated using HSPICE (level 6) and the pro-
posed model, for each combination ofMOSFET widths from 1 mto100 m.
Figure 3.9 shows the comparison between delay (defined as the delay at
50% between an input rise ramp of 200 ps and an output falling ramp)
calculated by the model and the delay simulated by HSPICE for each com-
bination of widths among 5 m and 30 m; similarly figure 3.10 shows the
comparison between the energy dissipated (during the output discharging)
by the circuit calculated by the model and by HSPICE.
Tab. 3.1: Mean Error
Mean error Max Error Min Error
Delay 6.115% 12.985 % 0.905%Energy dissipated 2.1% 6.3% 0.11%
Tab. 3.2: Execution time
HSPICE execution time FAST execution time
6384.3 sec. 188.91 sec.
The errors between the proposed model and the HSPICE simulation is
reported in table 3.1 while table 3.2 shows corresponding execution time.
These results are taken from the analysis of the circuit varying the dimen-
sions of the MOSFETs continuously from 1 m to 100 m.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
61/310
3.5. Conclusions 41
3.5 Conclusions
The model of this chapter is suitable for the optimization application of
chapter 5. It is able to compute the delay and the power consumption of
CMOS structures with good accuracy and a consistent speedup regarding
to the HSPICE simulation taken as a reference.
In a real production design cycle, this model might be used for a first pre
optimization of some basic cell; then in the last steps of the design flow the
optimization using a more accurate model for the delay (or power) evalu-
ation must be used.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
62/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
63/310
3.5. Conclusions 43
Energy Model
510
1520
2530
W1 [micron] 5
10
15
20
25
30
W2 [micron]
200
300
400
500
600
700
800
900
1000
Energy [fJ]
(a) FAST model
Hspice Simulation
510
1520
2530
W1 [micron] 5
10
15
20
25
30
W2 [micron]
200
300
400
500
600
700
800
900
1000
Energy [fJ]
(b) HSPICE
Fig. 3.10: Energy dissipated by the circuit of figure 3.2 with several combin-ation ofW1 and W2
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
64/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
65/310
Part III
OPTIMIZATION
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
66/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
67/310
Chapter 4
MATHEMATIC OPTIMIZATION
THE very basic theory of optimization is introduced here, in order to
develop some optimization schemes, useful later for the optimization
of real circuits.
The theory of mono-objective optimization involves some properties and
theorems regarding finding the minimum of functions, hence the annulling
of the functions first derivatives. These results can be extended (with some
restrictions) to the case of multivariable functions but when the functions
to be optimized are more than one, being optimized simultaneously, the anew theory may be introduced.
The whole goal of this introduction to mathematical optimization is
both the developing of reliable algorithms, and the justification of some as-
sumptions made in the chapter 5 (page 77), especially for the multi-objective
case.
In section 4.1 some mathematical optimization foundations are repor-
ted, and in particular in
4.1.1 is shown the theory of mono-objective optim-
ization (unconstrained, 4.1.1.1, and constrained, 4.1.1.2), while in 4.1.2 isshown the theory of multi-objective optimization (unconstrained, 4.1.2.1,and constrained, 4.1.2.2).The section 4.2 reports the basic and most useful numerical algorithms for
optimization purposes: in 4.2.1 some one-dimensional search techniques,in 4.2.2 some multi-dimensional search techniques, and in 4.2.4, 4.2.5some special algorithms.
Some conclusion and summarized characteristics are reported in section 4.3.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
68/310
48 Chapter 4. Mathematic Optimization
4.1 Optimization theory
Notation 4.1. In the following section, the function f is defined as:
f: X Rp Y R. X is called the decisions space, and Y is called the criteriaspace.
Problem 4.2 (Unconstrained optimization). Given the function f that de-
pends on one or more variable x X, the problem of optimize f, in thiscontext, is equal to find:
minx
Xf(x)
this is also known as an unconstrained optimization, since there are not any
constraints on the values the function f may assumes.
The unconstrained optimization is seldom applied in the field of digital
circuits, so the constrained optimization is defined as:
Problem 4.3 (Constrained optimization). Find
minxX
f(x) subject to gj(x) hj, j = 1, 2, . . . , m
where the n equations gi(x) hi constitute the set ofconstraints of the op-timization.
The function f is also called the objective of the optimization, or the cost
function of the problem.
The above problems are classical optimization problems, or mono-objec-
tive problems. The multi-objective unconstrained optimization is defined as
the problem to optimize a vectorial function, so that the objective-functionis a vector of objective-functions.
Notation 4.4. In the following (multi-objective optimization), the function f
is defined as:
f: X Rp Y Rn, or f= (f1, f2, . . . , n)|fi : X Rp Y R,Problem 4.5 (Unconstrained multi-objective optimization). Find
minxX
fi(x), i = 1, 2, . . . , n
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
69/310
4.1. Optimization theory 49
where there are n objective functions.
Finally, the multi-objective constrained optimization is defined as:
Problem 4.6 (Constrained multi-objective optimization). Find
minxX
fi(x), i = 1, 2, . . . , n subject to gi(x) hi, i = 1, 2, . . . , m
where there are n objective functions and m constraints.
The multi-objective optimization is a very complex problem, since the
problem of finding the minimum of two or more functions is apparently
only trivial: the set of independent variables xmin that minimizes, lets say,
the function f1, it is not supposed to minimizes (and generally it does not)
the other functions. So there should be a way to combine the information of
minimum among all the functions. The intuitive way of linear combination
is somewhat problematic:
ftot(x) =n
i=1
ifi(x), i R
because the functions fi
cannot be commensurable among them. For ex-
ample, if there is one function fj that is fj >> fi, i = j, then this functiondominate the total objective, giving false results for the optimization prob-
lem. This problem is illustrated in 4.1.2.
4.1.1 Mono-objective optimization
The mono-objective optimization is the standard optimization problem,
and is widely treated in literature (see [13] for an introduction). With this
preliminary statement, here are reported some results, useful to find a solu-tion for the problems 4.2, 4.3.
The existence of the minimum (at least one) is granted by the Weierstrass
Theorem1, but these minimums can be local or global:
Definition 4.7 (Local Minimum). The point x X is a local (or relative)minimum of the function f iff
> 0 : f(x) f(x) x X |x x| < .1 iffX is a compact set, as is in this context
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
70/310
50 Chapter 4. Mathematic Optimization
Definition 4.8 (Global Minimum). The point x X is a global (or abso-lute) minimum of the function f iff f(x) f(x) x X.Definition 4.9 (Feasible direction). d Rn is a feasible direction if >0 : x+d X, : 0
In an intuitive manner the concept of feasible direction is useful to solve
the problem of minimization: we search all the direction in which the func-
tion f is decreasing.
Lemma 4.10 (First order necessary condition). If x
X is a minimum of
f C1 then d Rn, where d is an feasible direction, dT f(x) 0, where() has the usual definition of scalar product in the space Rn.
Corollary 4.10.1. If x X is an internal point of X, then dT f(x) = 0
Lemma 4.11 (Second order necessary condition). If x X is a minimum off C2 then d Rn, where d is an feasible direction,
i) dT f(x) 0;
ii) if dT f(x) = 0 then dT 2f(x) d 0
Corollary 4.11.1. Ifx X is an internal point of X, then
i) dT f(x) = 0
ii) dT 2f(x) d 0
The conditions of the corollary 4.1.1 are necessary and sufficient con-
ditions for the existence of the minimum (local). In order to have some
information about the existence of a global minimum, the theory of convex
functions must be very briefly reported.
Definition 4.12 (Convex function). The function f: X Y, where X is aconvex set2, is convex ifx1, x2 X : 0 1
f(x1+ (1 )x2) f(x1)+ (1 )fx2) (4.1)2 A set X R n is convex ifx, y X the segment [x, y] is totally contained in X
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
71/310
4.1. Optimization theory 51
If in the equation (4.1) the sign < applies, then the function is said to be
strictly convex.
Another way to write the equation (4.1) is:
Lemma 4.13. The function f C1 : X Y is convex over a convex set X if
f(y) f(x)+f(x)f(y x), y, x X
or, if f is twice derivable,
Lemma 4.14. The function f C2 : X Y is convex over a convex set X if
2f(x) 0, x X
The convex functions are a very useful mathematical tool in the class of
optimization problem, mainly for the next two results:
Theorem 4.15. If f: X Y is convex over a convex set X, the set A of the min-imum of the function is convex, and every local minimum is also a global min-
imum.
Theorem 4.16. If f C1 : X Y is convex over a convex set X, and if x X : x Xf(x)(x x) 0, then x is a global minimum of f over X.
The theorem 4.16 also implies that the conditions of the lemma 4.10 and
corollary 4.10.1 (first order conditions) are both necessary and sufficient
conditions for the existence of a global minimum.
4.1.1.1 Unconstrained problem
All the previous results are, almost in theory, sufficient to solve the
problem 4.2. The theory of the convex function ensures the existence of
a global minimum, while lemma 4.10, corollary 4.10.1, and theorem 4.16
suggest a method to find this minimum. We will see in 5.1 how thesemethods apply to real circuits, in which, for example, the functions deriv-
ative are not available.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
72/310
52 Chapter 4. Mathematic Optimization
4.1.1.2 Constrained problem
The solution of problem 4.3 is slightly more complicated. The pres-
ence of constraints reduces the feasible set of independent variables that
are solutions of the problem. So the solutions, (i.e. the value of independ-
ent variables that minimize the objective function), must be searched in the
set x C X that satisfies all the constraints.The most important method to solve the problem of the minimization tak-
ing into account the satisfaction of some constraints (and, incidentally, the
method most useful for our real problem) is the method of the Lagrange
multiplier (and its derived, the method of the penalty function).
Lagrange multiplier and Penalty functions The first method defines a
Lagrangian function:
L(x, ) = f(x)+m
i=1
igi(x) (4.2)
If we define x as the solution that:
x =minxX
f(x) gi(x) 0, i = 1, 2, . . . , m
then we can write the necessary KuhnTucker conditions for the existence
of the minimum:
x L(x, ) = 0 (4.3)
L(x
, )
0 (4.4)
()Tg(x) = 0 (4.5)
0 (4.6)
In order to find out sufficient conditions, we define the saddle-point condi-
tions:
Theorem 4.17. A point (x, ) with 0 is a a saddle-point of the LagrangianL(x, ) iff
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
73/310
4.1. Optimization theory 53
i) x minimizes L(x, ) over the whole X
ii) gi(x) 0, i = 1, 2, . . . , m
iii) i gi(x) = 0, i = 1, 2, . . . , m
It can be proved that if the functions f,g are even not differentiable but
are convex, then the saddle-point conditions are necessary and sufficient
conditions. Although these conditions must hold at the minimum, they are
not very useful in determining the optimum point. The determination of
the optimum by direct solution of these equations is rarely practicable.
A more feasible way is to convert the constrained problem into an un-
constrained one, by defining the new objective function:
P(x, K) = f(x)+m
i=1
Ki[gi(x)]2 (4.7)
The sum added to the objective function is called penalty function, since it
penalizes the objective function adding a positive quantities (recall that we
want to minimize the cost function). The constants K = [K1, K2, . . . , Km]T
are weighting factors (positive) that define how strongly must be satisfied
the ith constraint, and can also made it commensurable.
Wherever x is inside the feasible region, we can ignore the constraints,
so a new objective function can be defined as:
P(x, K) = f(x)+
m
i=1
Ki[gi(x)]2ui(gi) (4.8)
where ui(gi) is the usual step function:
ui(gi) =
0 ifgi(x) 01 ifgi(x) > 0
The introduction of the step function makes possible to relate the pen-
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
74/310
54 Chapter 4. Mathematic Optimization
alty function defined in (4.8) with the Lagrangian function of (4.2) (page 52):
P(x, K) = L(, K)
if we let i = Kigi(x)ui(gi), so that all previous results valid for the Lag-
rangian function are valid for the penalty function.
Note that the solution x found optimizing the penalty function P(x, K)
converges to (x, ), defined by the KuhnTucker conditions, only in the
limit K .
4.1.2 Multi-objective optimization
The multi-objective optimization is not a standard problem in the engin-
eering, but is quite common in economics ([14]). While with the mono-
dimensional problem the concept of optimum as a minimum is quite clear
and defined (the idea of greater or lesser is intuitive with the real number),
with multi-objective (also multi-criteria) the concept of minimum is less in-
tuitive. So we must define some relation of order among the points in a
multi-dimensional space.
Notation 4.18. Given x, y Rn, define
x = y iff xk = yk k = 1, 2, . . . , nx y iff xk yk k = 1, 2, . . . , nx y iff x y and x = y (sok : xk < yk)x < y iff xk < yk k = 1, 2, . . . , n
Notation 4.19. In the following section, the function f is defined as: f: X
Y, X Rp, Y Rn. X is called the decisions space, while Y is calledthe criteriaspace.
Given two outcome y1, y2 of the cost functions, y1 = f(x1) and y2 =
f(x2), we must define which is better and we indicate that y1 is better than
y2 with y1 y2, that y1 is worse than y2 with y1 y2, and, finally, that y1 isindifferent with respect to y2 with y1 y2.
In the optimization theory a great importance has the definition ofPareto
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
75/310
4.1. Optimization theory 55
point or Pareto preference:
Definition 4.20 (Pareto preference). Given y1, y2 Y, the Pareto preferenceis defined by
y1 y2 iff y1 y2.
A Pareto preference is intuitively guided by the relation lesser is better.
Definition 4.21 (Non-Dominated and Dominated set). Ify1 y2 is a bin-ary preference defined on Y, the dominated and the non-dominated set
with respect to {} are defined as:
N({}, Y) = {y0 Y | y Y : y y0}D({}, Y) = {y0 Y | y Y : y y0}
If y0 N({}, Y), y0 is a Npoint. Similarly, if y0 D({}, Y), y0 is a Dpoint.
Definition 4.22 (Pareto optimum). y
Y is a Pareto optimum iff it is a N
point with respect to Pareto preference.
We will give now two theorems that are fundamental for the solution of
the multi-objective optimization problem; first we introduce the definition
ofconvex cone in Rn:
Notation 4.23 (convex cone).
> ={d Rn |d > 0} =
{d
Rn
|d
0}
= ={d Rn |d 0}
Theorem 4.24. i) ify0 Y minimizes y over Y for some >, then y0is a Npoint;
ii) ify0 Y uniquely minimizes y over Y for some , then y0 is aNpoint.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
76/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
77/310
4.1. Optimization theory 57
4.1.2.2 Constrained
Again, the solution is to reduce the complexity of the problem from the
multi-objectivity to a mono-objective one. It is possible to combine the two
previous methods, that is to minimize a linear weighted function plus a
sum of penalty function; the only critical point is to ensure the same order
of magnitude of each term of the sum, such that there is not a dictatorship
of one term of the sum. The third chance to solve an unconstrained problem
(or a constrained, but with some care) is to use the method of the compromise
solution:
Compromise solution Given the problem 4.3, it is possible to define y as
the ideal outcome of the cost function f(x) without any constraints, so that
y = infxX
f(x); the compromise solution is defined as the minimum ofregret:
r(y) = y y;
typically, the Lpnorm (the distance between the actual solution and the
ideal point) ) it is used:
r(y) = r(y;p) =
n
i=1
|yi yi |p 1
p
.
Again, a weight can be associated for each term of the sum:
r(y;p, w) =
n
i=1
wpi |yi yi |p
1p
.
Definition 4.26 (Compromise solution). The compromise solution with re-spect to Lpnorm is yp Y that minimizes r(y;p, w) over Y.
The compromise solution enjoys several properties, the most important
is:
Property 4.27 (Pareto optimality). The compromise solution yp Y is anNpoint, for 1 p < with respect to Pareto preference (definition 4.20).Ify is unique, then it is also an Npoint.
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
78/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
79/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
80/310
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
81/310
4.2. Optimization Algorithms 61
This implies that |b a| = |x c|, and that at each iteration the interval isscaled of the same ratio .Then we repeat the process with the new triplet. So the interval (a, c) is di-
vided in two parts, a smaller and a larger, and the ratio between the whole
interval and the larger is the same between the larger and the smaller, or in
other words:
1
=
1 ,
giving for the positive solution
=
5 1
2.
This fraction is known as the golden-mean or golden-section, whose aes-
thetic properties come from ancient Pythagoreans.
Convergence considerations All the three previous methods have a lin-ear convergence, since at each iteration the ratio between the interval con-
taining x and the new smaller interval is:
0 Ik+1Ik
1.
The asymptotic convergence rate is defined as
lim
k
Ik+1
Ik
.
For the dicotomic search, since 2Ik+1 = Ik + , taking = 0 we have
limk
Ik+1Ik=
1
2.
For the Fibonacci search, first we must write the generic number of the
Fibonacci sequence in a closed form:
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
82/310
62 Chapter 4. Mathematic Optimization
fk =1
5
1+
5
2
k+1
1 52
k+1.
then it can be proved that, taking = 0:
limk
Ik+1Ik= lim
kfk+1
fk=
5 1
2
For the golden section search, as previously saidIk+1
Ik= , so
limk
Ik+1Ik= =
5 1
2.
Thus the convergence rate of the Fibonacci and the golden-section search are
identical.
4.2.1.2 Parabolic interpolation
Given a triplet (a, b, c) that brackets a minimum, we approximate the
objective function in the interval (a, c) with the parabola fitting the triplet.
Then we find the minimum of this parabola with the formula (since we
want the abscissa, the method is indeed an inverse parabolic interpolation):
x = b 12
(b a)2[f(b) f(c)] (b c)2[f(b) f(a)]
(b a)[f(b) f(c)] (b c)[f(b) f(a)]
This method is useful only when the function is quite smooth in the in-
terval, but it has the advantage that the convergence is almost quadratic,and it is perfectly quadratic when the function to be optimized is a quad-
ratic form.
The Brents rule The Brents rule is a mix of the last two techniques: it
uses the golden section when the function is not regular and switches to a
parabolic interpolation when the function is sufficiently regular. In particu-
lar, it tries always a parabolic step. When the parabolic step is useless then
-
8/7/2019 Design and optimization techniques of high-speed VLSI circuits
83/310
4.2. Optimization Algorithms 63
the method use the golden section search.
4.2.2 Multi-dimensional search
Thi