pll-based active optical clock distribution
Post on 01-Jun-2022
7 Views
Preview:
TRANSCRIPT
PLL-Based Active Optical Clock Distribution
by
Alexandra M. Kern
A.B., Engineering SciencesB.E., Electrical Engineering
Dartmouth College, 2002
MAsSACHUSE!S INSTI EOF TECHNOLO"
OCT 2004
LIBRARIES
Submitted to the Department of Electrical Engineeringand Computer Science
in Partial Fulfillment of the Requirements for the Degree of
Master of Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2004
© 2004 Massachusetts Institute of TechnologyAll Rights Reserved
Author ...........................Department of Electrical Engineering
and Computer ScienceJuly 23, 2004
Certified by ................................Anantha P. Chandrakasan
Profesoqr of Electrical Engineering3Thesis $upervisor
Accepted by ......... . -.Arthur C. Smith
Chairman, Department Committee on Graduate Students
ARC IHIVw+1 v~.'.- · s
· _
PLL-Based Active Optical Clock Distribution
by
Alexandra M. Kern
Submitted to the Department of Electrical Engineeringand Computer Science
on July 23, 2004, in partial fulfillment of therequirements for the degree of
Master of Science
AbstractReducing the timing uncertainty associated with clock edges has become an exceed-ingly difficult problem as clock frequencies in high-performance processors increasepast several gigahertz. Absolute quantities of skew and jitter that were insignificantat lower frequencies now consume an increasingly large percentage of each clock cycleand directly reduce the time available for logic propagation. Processor designers cur-rently employ several types of electrical deskew mechanisms to combat this problemin order to delay the inevitable need for more radical clocking solutions.
Optical clock distribution has the potential to deliver extremely high precisionglobal clocks across large chips. However, traditional transimpedance amplifier ap-proaches to optical-electrical conversion introduce so much timing uncertainty thatthe accuracy gained through optical global distribution is lost at the global-to-localclock domain interface.
This thesis analyzes the feasibility of a phase-locked loop (PLL) based approachto the optical-electrical clock signal conversion. The proposed small-signal current-steering optical-electrical phase detector extracts timing information from the opticalreference without explicit optical-electrical conversion. This phase detector is inte-grated with a loop filter, LC VCO, and frequency divider to form a complete optical-electrical PLL system capable of generating 1.6 GHz local electrical clocks from a200 MHz global optical reference. The insights gained through the design and imple-mentation of this system are used as the basis for a broader analysis of the advantagesand challenges of PLL-based optical clock distribution systems.
Thesis Supervisor: Anantha P. ChandrakasanTitle: Professor of Electrical Engineering
3
4
�
Acknowledgments
I would like to thank my advisor, Dr. Anantha Chandrakasan, for the patient guid-
ance and insight he provided as I learned from each of the many rewarding challenges
I encountered during the first two years of my graduate career at MIT. Researching
and writing this thesis under his expert supervision has taught me many invaluable
lessons about the pursuit of academic research.
Many other extraordinary individuals have also shaped my academic development
over the past six years. Mr. David Kneedler, Mr. Ian Fink, and Mr. John Sledziewski
supervised my first two undergraduate internships and introduced me to the possi-
bilities of the semiconductor industry. Dr. Charles Sullivan and Dr. Edmond Cooley
were my mentors during my early undergraduate years and guided me through my
first experiences in academic research. Dr. Ian Young and Mr. Thomas Thomas
gave me the opportunity to apply my analog design skills in an industrial setting and
provided the perfect balance of expert guidance and independence. My colleagues
and friends in my research group always made time to discuss ideas and provide
valuable insights. I am grateful to these mentors for providing the opportunities and
encouragement that helped me build the foundations of my future career.
My friends have shared and enhanced my experiences at MIT. Coffee breaks when
we could least afford to waste the time, extended midnight phone conversations about
everything and nothing, heated lunchtime lab discussions on topics ranging from
politics to genetics, visits and calls from out-of-town friends who dare to believe that
the world does not revolve around MIT, and excursions both into the city and into
the wilderness have balanced the academic challenges of the past two years. I would
especially like to thank Vin Scarlata, Johnna Powell, Julia Cline, Elizabeth Basha,
Alicia Messmer, Joanna Lisker, Emily Halpern, Anne Thompson, and Devika Gopal.
The unconditional love and support of my family has allowed me to pursue in-
creasingly challenging goals with the reassuring knowledge that they will always be
there to catch me. My grandparents, Ja and Martha Densmore and Florence Kern,
are all unique role models and their stories have always inspired me. My brother,
5
Christopher, is unusually adept at filtering out the various pressures and expecta-
tions of society and pursuing objectives of his own choosing and I have learned from
his positive example. My parents, Edward and Priscilla, supported me and tolerated
my trademark indecisiveness through brief interests in various other fields before I fi-
nally chose engineering. They have always encouraged me to pursue my own dreams,
find a career that makes me truly happy, and create my own definition of success. I
am eternally thankful for their love and their unwavering belief that I can accomplish
anything I choose.
6
Contents
1 Introduction 17
1.1 Motivation for Optical Clocking . . . . . . . . . . . . . . . . . . 17
1.2 Current Electrical Clock Distribution Practices ............. 18
1.3 Prior Work on Optical Clock Distribution ............... 19
1.4 Objective of This Work .......................... 24
2 VCO and Divider Circuits 25
2.1 Divider ................................... 25
2.2 VCO .................................... 28
3 Optoelectronics 33
3.1 Photodiode Background ......................... 33
3.2 Standard CMOS Silicon Photodiodes . ................. 35
3.2.1 Possible Diode Structures in Standard CMOS ......... 35
3.2.2 Photodiode Junction Capacitance ............ . 37
3.2.3 Transit Time. .......................... 40
3.3 Photodiodes in SOI and Custom Processes . .............. 44
3.3.1 SOI Photodiode Receivers ................ . 44
3.3.2 CMOS-Compatible Custom Photodiode Processes ....... 45
3.4 Waveguides ................................ 46
3.5 Conclusions ................................ 47
4 Analysis of Phase Detectors 49
7
4.1 Current-Steering Phase Detector ................... .. 50
4.1.1 Basic Current-Steering Topology and Operation ........ 50
4.1.2 Sources of Phase Offset . . . . . . . . . . . . . . . . . . . 51
4.2 Extensions of Current Steering Topology . ............... 67
4.2.1 Current Mirrors .......................... 67
4.2.2 Photodiode in Feedback. . . . . . . . . . . . . . . . .. 68
4.3 Topologies with Alternate Phase Detector Cores . ........... 70
4.3.1 Bang-Bang Phase Detector ................... . 70
4.4 Conclusions ................................ 74
5 PLL Loop Dynamics and Complete Circuit Simulations 75
5.1 Optical PLL Analysis ........................... 75
5.1.1 Acquisition Range . . . . . . . . . . . . . . . ...... 76
5.1.2 Small-Signal Stability Analysis ................. 82
5.2 Final Simulated Results. . . . . . . . . . . . . . . . . . . . . 84
6 On-Chip Skew Measurement 89
6.1 TDC Concept ............................... 89
6.2 Critical Path ............................... 90
6.3 Control and State Machine ........................ 92
6.4 Implementation .............................. 93
6.5 Additional Qualitative Verification ................... 94
6.6 Conclusions ................................ 95
7 Conclusions 97
7.1 Summary ................................. 97
7.2 Simulation Results ............... ......... 98
7.3 Future Work ............... ................. 99
7.3.1 Optoelectronics .......................... 99
7.3.2 Circuits .............................. 100
7.3.3 Complete System ......................... 101
8
__
7.4 Conclusion ................................ 102
9
10
List of Figures
1-1 H-tree clock distribution. . . . . . . . . . . . . . . . . ...... 19
1-2 Receiverless optical clocking ........................ 21
1-3 Proposed optical PLL system ....................... 22
1-4 Original optical PLL clocking proposal - Clymer/Goodman. ..... 24
2-1 Divider architecture. ........................... 26
2-2 Circuit schematic of embedded XOR register block of Figure 2-1 .... 27
2-3 Divider output . ............................. 27
2-4 VCO core and buffer circuits. ...................... 29
2-5 ASITIC II model .............................. 29
2-6 VCO gain .................................. 30
2-7 VCO output for control voltage of 0.4 V . ....... ......... 31
3-1 Possible CMOS diode structures. ..................... 38
3-2 Junction capacitance versus reverse bias . ....... ......... 39
3-3 Depletion width versus reverse bias. . . . . . . . . . . . . . . ..... 39
3-4 Junction capacitance versus intrinsic width. .............. 41
3-5 Illustration of transit time ......................... 42
3-6 Transit time versus intrinsic width. ................... 43
4-1 Basic current-steering phase detector topology and operation. .... 51
4-2 Phase difference versus average current transfer function of current-
steering phase detector. . . . . . . . . . . . . . . . . ....... 52
4-3 Simplification of phase-detector structure. ............... 53
11
4-4 Ideal output, actual output at TT/27 °C and matched idealized output
(solid lines), and actual output over FF/SS/100 °C (dashed lines). .. 58
4-5 Skew for locked PLL across SS, FF and TT process corners. ..... 58
4-6 Skew for locked PLL at 27 C and 100 °C. ............... 59
4-7 Skew over SS/FF corners with ideal amplifier and real CMOS switches
(solid lines) and optical reference (dashed line). . ........ 61
4-8 Skew over SS/FF corners with real amplifier and ideal switches (solid
lines) and optical reference (dashed line) . ................ 61
4-9 Skew over SS/FF corners with ideal amplifier and triple-sized CMOS
switches (solid lines) and optical reference (dashed line) . ...... 62
4-10 Phase difference for locked PLLs at TT/27 °C with 10 1 A/20 pA versus
10 iA/10 pA differential current mismatch. .............. 64
4-11 Skew for locked PLLs at TT/27 C with 10 pA versus 12 /A common-
mode current mismatch. ......................... 66
4-12 Current mirror approach .......................... 68
4-13 Feedback amplifier approach. ...................... 69
4-14 Bang-bang phase detector approach. .................. 71
5-1 Well-damped, overdamped, and underdamped loop dynamics. .... 77
5-2 Typical characteristics of a phase detector (top) and a phase-frequency
detector (bottom). ............................ 78
5-3 Loop filter topology. ...................... ..... 80
5-4 Photodiode capacitance of 500 fF limits lock range. . ....... 80
5-5 Cycleslipping: Simulation of the PLL with the diodes modeled as cur-
rent sources with 200 fF parallel capacitance and a 100 fF/20 k2 loop
filter ..................................... 81
5-6 Complete layout of the PLL ........................ 85
5-7 Well-damped locking: Simulation of the PLL with the diodes modeled
as current sources with 200 fF parallel capacitance and a 800 fF/44 kQ
loop filter. ................................. 86
12
5-8 PLL locking from both extremes of input voltage range ......... 87
5-9 VCO output clock (dotted), optical reference (dashed, 10 IA amplitude
scaled for comparison), and divider output (solid) shown at the end of
the locking transient of Figure 5-7. ................... 87
6-1 Time-to-digital converter. ........................ 91
6-2 Split-output TSCP latches ......................... 92
13
14
I _
List of Tables
2.1 VCO inductor HI model values ....................... 30
4.1 Summary of skew sources. ......................... 63
6.1 State table for the microcoded state machine. ............. 94
15
16
Chapter 1
Introduction
1.1 Motivation for Optical Clocking
As clock frequencies in high-performance processor applications increase past several
gigahertz, meeting the increasingly rigorous skew and jitter requirements with tradi-
tional electrical clock distribution systems will become prohibitively difficult. Uncer-
tainty in the clock edge due to skew and jitter directly reduces the time available for
logic propagation and therefore limits the maximum logic depth and increases the re-
quired hold time. An absolute quantity of skew that could be tolerated at slower clock
frequencies will occupy a much more significant percentage of the total clock period
at higher frequencies, so skew and jitter limits are typically specified in percentages
instead of absolute terms. Typical systems require that the combined effect of skew
and jitter not exceed 10 percent of the clock period, though recent stretching of that
budget to 20 percent of the clock cycle is evidence of the fact that the challenges of
precise electrical clock distribution are intensifying [1]. Replacing the global levels of
the clock distribution with optical waveguides will likely be required in the future,
but high-speed, high-precision optoelectronic conversion will be required to maximize
the advantages of optical distribution. This thesis will analyze the feasibility of using
an optical PLL receiver circuit to generate local electrical clocks from a global optical
reference.
17
1.2 Current Electrical Clock Distribution Practices
Before beginning an analysis of optical alternatives, it is instructive to examine the
electrical clock distribution methods currently employed in today's state-of-the-art
microprocessors. This is beneficial not only because it provides the background nec-
essary to understand the relative advantages of an optical system, but also because
many of the methods utilized in an optical system are derived from electrical system
precursors.
Active techniques for distributing precise clocks across chips began to appear in
significant quantities in the literature around 1990. Prior to that date, clock frequen-
cies below 100 MHz allowed designers to distribute clocks with sufficient precision
using passive networks. In 1992, Intel introduced the now-prevalent concept of using
phase-locked loops to generate on-chip clocks from lower frequency off-chip references
in order to overcome package bandwidth limitations and improve precision [2].
Once the clock is on chip, many sources in an electrical clock distribution network
contribute skew and jitter. Variations in clock buffer speed due to device and supply
variation, differences in capacitive coupling to adjacent lines, unmatched load capac-
itance, and variations in the resistance and the capacitance of the lines themselves
all introduce timing variation [3]. Distributing all clocks from a central point on the
chip through paths of matched length is one obvious measure that is almost always
used to reduce skew. This can be accomplished either with a symmetric H-tree distri-
bution, illustrated in Figure 1-1, or with an asymmetric matched-length routed path
scheme. However, path length matching alone is no longer a sufficient solution. While
matching lengths may be possible, matching capacitive load and coupling across a
complicated chip is practically impossible. Therefore, most modern clock distribu-
tion schemes use some type of matched-length distribution in conjunction with other
deskew methods.
Figure 1-1 shows a top level H-tree distribution and sixteen local grids, with the
clock buffers that would certainly be present omitted for simplicity. The small black
boxes at the global-to-local interface might represent any number of deskew mech-
18
Figure 1-1: H-tree clock distribution.
anisms. The first Itanium processor used an active deskew scheme, but designers
of subsequent generations cited manufacturing concerns as a reason for reverting to
passive fuse-based deskew [4] [5]. Active deskew provides the capability of adjusting
for temperature and supply induced skew, but at the cost of possible stability con-
cerns. Despite their implementation differences, both of these processors use H-tree
distribution at the global level with an array of deskew circuits interfacing the global
network to the local grids.
1.3 Prior Work on Optical Clock Distribution
Using a H-tree optical waveguide to distribute a global optical timing reference to
several optoelectronic receivers across a chip initially appears to be a perfect solution
to the problem of skew and jitter. This is true in the sense that the optical signal
19
arriving at those receivers has imperceptible skew and the jitter is limited only by
the extremely precise laser source. Transimpedance amplifiers (TIAs) are commonly
employed to convert photocurrent inputs to voltage outputs, but converting these
small currents to full-scale logic voltages requires a high-gain transimpedance stage
followed by several stages of voltage amplification. Process and supply variation can
introduce significant skew and jitter in these circuits, often negating the benefits of
optical top-level distribution.
A comprehensive review of prior work, technical challenges, and possible benefits
of optical clocking is presented in [6]. The extensive reference list is an indispensable
introductory resource encompassing all aspects of the optical interconnect challenge.
This work correctly points out that optical interconnects must become CMOS com-
patible, high-density, precise, and economically attractive in order to succeed. Intel
recently considered many of these criteria in an analysis of TIA-based approaches for
clocking and interconnect applications [1]. Despite their rather aggressive assump-
tions about the future performance of integrated optical components, they concluded
that optical clocking will not be a practical replacement for electrical clocking. They
argued that the area requirements and radius of curvature of optical waveguides will
limit optical distribution to the highest levels of the global clock domain, which do not
account for a significant portion of the timing mismatch, and that therefore optical
clocking will not provide significant performance enhancement. They compared the
performance to both scaled and unscaled copper interconnects and concluded that,
while optical clocking did outperform scaled copper, using unscaled copper to achieve
the same performance was more cost effective and less disruptive to the manufacturing
process.
These conclusions, however, are restricted to a system analysis assuming that the
global clock frequency is 4 GHz and that TIAs are used for signal conversion. As
frequencies continue to increase, electrical clock distribution will eventually reach a
fundamental limit. It may be true that TIAs are not the ideal solution, but optical
clocking may still be a viable solution as researchers are currently exploring many
alternative methods of achieving optical-electrical signal conversion.
20
Vdd
I II II I
Diode, A
Inputs: I I
_TimeTime
-H-XP Clockk
Figure 1-2: Receiverless optical clocking.
Researchers at Stanford have proposed a "receiverless" method of optical clock-
ing, shown in Figure 1-2 [7]. In this scheme, a high-energy, short-duration pulse is
generated by a mode-locked laser. This signal power is split and one branch is delayed
by T. These signals are used to drive two photodiodes which respectively charge and
discharge the CMOS gates to be clocked. This approach has low skew and jitter if
the register gates are driven directly. However, since the optical power required to do
this would be prohibitively large, intermediate buffers would most likely be required
for a practical clock network and would introduce uncertainties.
Using a PLL to generate local clocks from a global optical reference allows both
active deskew of the gate-level clocks and local generation of low-jitter clocks. Fig-
ure 1-3 shows the architecture of the proposed optical PLL system. A complete
optical clock distribution using this system would replace the top-level distribution
in Figure 1-1 with an optical waveguide structure and use one instance of the optical
PLL to generate the clocks for each local network. The 1.6 GHz low-jitter local clocks
generated by the LC VCO are buffered as much as the clock load requires and de-
livered to the registers. Because the clocks are generated locally, there are no global
distribution clock buffer chains to introduce skew and jitter. The PLL eliminates any
skew generated in the forward path and the clock buffers, so skew between instances
is introduced only by variations in phase detector offset and divider delay. Jitter is
determined by the VCO phase noise and the jitter generated by the short forward-
path buffer chain used to amplify the clock power from the VCO to the gates. Using
21
Optical Reference
Figure 1-3: Proposed optical PLL system.
the optical signal as a precise reference, instead of directly sensing and amplifying
the optical input power in order to generate a full-swing signal as in the other two
approaches, therefore provides a significant advantage. In addition, using a divider
in the feedback path allows generation of high-frequency local clocks from a lower
frequency external optical reference.
As with any complex circuit system, there are many topology choices that affect
the final implementation of a PLL. Clearly the topologies for main blocks such as the
divider and VCO will be chosen for their performance in the particular application.
However, the two most fundamental decisions are the choice of a Type I or Type
II loop, the selection of phase or phase-frequency detection, and the determination
of the loop order. A Type I PLL uses a phase detector that generates an output
voltage proportional to the phase difference of the reference signal and the feedback
signal. This output voltage is simply low-pass filtered to generate the control voltage
and this type of PLL therefore has a finite possible voltage range and may exhibit
static phase error during lock, as there is no integration in the phase detector. A
Type II PLL uses a phase detector that adds a second integrator to the forward path.
This is typically implemented by using a phase detector in conjunction with a charge
pump. In a charge pump PLL, the phase error signal generated by the phase detector
is used to issue "UP" and "DOWN" pulses to a charge pump, which responds by
incrementally increasing or decreasing the voltage on the loop filter. Because of the
22
CLOCK
second integrator there can be no steady-state error between the two signals, but
because the phase detector and the VCO both contribute integrators to the forward
path, the loop filter must be carefully designed to stabilize the loop. Simple phase
detection and phase-frequency detection result in very different loop dynamics. A
simple phase detector detects only phase difference, not frequency difference, and its
gain changes sign if the phase deviates too far from lock. Therefore, if the VCO
frequency is too far from the reference then the phase cycles through the positive
and negative gain regions faster than the detector can control the VCO and the PLL
does not achieve lock. A phase-frequency detector (PFD), however, uses knowledge
of all the clock edges to detect phase and frequency, so the detector constantly drives
the loop toward lock with much improved dynamics and no limit to range. Type II
PLLs with PFDs are used in most modern systems due to their improved range and
tracking. A concise and accessible tutorial on the basics of PLL design is available in
[8]. Finally, the loop filter must be chosen to work with the selected phase detector
or PFD. Charge pump PLLs require more complex filters to stabilize the loop due to
the additional integrator introduced by the charge pump.
Researchers proposed the idea of optical clock distribution as early as the 1980's.
One of the first papers suggested the idea of using the optical clock as a reference for
local phase-locked loops [9]. They compared this approach to a transimpedence am-
plifier and concluded that it improved performance and reduced power consumption.
The proposed phase detector is shown in 1-4.
The details of operation can be found in the paper, but the basic functionality
is simple to understand. The phase detector has three possible output voltages.
When the VCO output is low, the diode is forward biased and the phase detector
output voltage is determined by the diode drop. When the VCO output is high, the
diode is reverse biased and the voltage is the result of the simple resistive voltage
divider if the optical signal is low or a resistive divider with an additional current
source if the optical signal is high. In this way, the filtered average output voltage
of the phase detector indicates what percentage of the total period was spent in
each state. This is a Type I PLL and achieves only 12.8 MHz of locking range with
23
Figure 1-4: Original optical PLL clocking proposal - Clymer/Goodman.
an output center frequency near 100 MHz. This range is simply not sufficient for
a modern or future clocking system so a Type II loop should be used instead to
improve range. Furthermore, the performance of this circuit at higher speeds will be
RC limited since the photodiode current is driven directly into a resistor to generate
the error signal voltage. The full potential of PLL-based optical clock distribution for
future applications may be realized by investigating small-signal charge pump phase
detectors and the resulting Type II loops.
1.4 Objective of This Work
The objective of this work is to explore the feasibility of a small-signal, current-
steering Type II phase detector as the central component of a clock distribution
network in a modern standard CMOS process. While design of VCOs and frequency
dividers is well understood, this type of phase detector is relatively unexplored and
there are many remaining challenges. This work will present an analysis of the chal-
lenges and possibilities for the design of such a phase detector, deriving examples from
the lessons learned during design of one particular topology and then discussing the
advantages and disadvantages of some other possible topologies. Simulation results of
a full custom layout implementation of the optical-electrical PLL clock system with
full parasitic capacitance extraction are presented.
24
Chapter 2
VCO and Divider Circuits
The implementation of a PLL with frequency multiplication requires a phase detector,
loop filter, VCO, and divider. The optical-electrical phase detector and loop filter are
the central parts of this work and will be analyzed in detail, but it is also necessary
to briefly summarize the choices of VCO and divider topologies.
2.1 Divider
In a PLL with frequency multiplication, the PLL output is controlled indirectly by
locking the feedback divider output to the reference signal. Any variation in divider
delay between two instances of the PLL will result in phase error of the generated
clocks even when the divider outputs are perfectly matched. The low-bandwidth loop
filter attenuates jitter introduced by the divider in the feedback path, but skew due
to process and temperature variations across the dividers will be directly translated
into skew between the generated 1.6 GHz clock outputs.
Both synchronous and asynchronous dividers are commonly used in PLL feedback,
often in a hybrid combination employing an asynchronous prescaler stage followed by
a synchronous divider with a large divide value. Cascaded divide-by-two asynchronous
dividers have a speed advantage as the logic depth is much shallower than a larger
value synchronous divider, but the data is latched many times instead of one, which
increases the potential for skew introduction. For example, consider a divider with
25
Figure 2-1: Divider architecture.
N=8. If a synchronous implementation with one clock-Q delay experienced X sec-
onds of skew due to a particular process or temperature variation, an asynchronous
divider with three stages would experience 3X seconds. Because minimizing skew is
imperative for this application, the divider architecture should be fully synchronous if
possible, while staying within reasonable logic style boundaries. Furthermore, a fully
differential circuit style should be used to maximize resistance to skew. High-speed
RF logic styles, such as resistively loaded SCL with small-signal outputs, are not
suitable for use in this design since full-swing outputs are required.
Since the maximum output frequency of the VCO in the typical corner is 1.8 GHz,
the divider is required to function properly at 2.0 GHz in the slow corner in order to
allow for a reasonable safety margin. Several circuit styles were considered, but none
allowed a fully synchronous divide-by-eight at these speeds without using resistively
loaded circuit styles. Figure 2-1 shows the chosen divider architecture, which consists
of a divide-by-two prescaler and a synchronous divide-by-four. The divide-by-two
circuits are implemented with registers in the standard feedback configuration such
that the output changes state at each positive clock edge and generates an output
at half the input frequency. The XOR register in the synchronous divider includes
XOR logic embedded in the first latch in order to increase speed performance and is
shown in Figure 2-2. Both the simple registers and the embedded XOR register are
implemented in source-coupled logic (SCL) with cross-coupled PMOS loads.
The extracted and simulated divider output waveforms, shown in Figure 2-3,
demonstrate that the divider does function properly at 2 GHz as designed.
26
Prescaler Synchronous %4 with Embedded XOR. - -1 I r - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2Q0 MHzOutput1.6 GHz
Input
II
---·-
Figure 2-2: Circuit schematic of embedded XOR register block of Figure 2-1.
1.8
1.6
1.2
S0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3 3.5 4Time (ns)
Figure 2-3: Divider output.
27
--0 A
AB (From Previous Stage)~-AB
cKl
AlL
BiL
Divider Input and Output
r- I I I I II II I tIII I II II II I I II I I I I
Ii' II iiI II I III I I I t I II I i I II I I II I
II I i~ II II~ II I It
II II I I II I
~~I II II 11 1 I liiI$ II It I
2.2 VCO
The VCO must provide excellent jitter performance and tolerance of supply noise in
order to generate precise output clocks. Furthermore, because the PLL uses only a
phase detector instead of a phase frequency detector, the VCO must have a small
enough tuning range that the loop cannot initialize itself too far from the correct
frequency to acquire lock or accidentally lock to harmonics of the reference. The
combination of these two criteria suggests that a LC VCO is the best option to obtain
the desired performance. However, given that area is a major concern in production
level microprocessors, it is important to note that there are circuit techniques available
to obtain acceptable performance without large passive devices if necessary [10].
Figure 2-4 shows the VCO used in the PLL. The VCO core is a standard topology
with both PMOS and NMOS cross-coupled gain devices and a varactor-tunable LC
tank [11]. The application requires a relatively narrow tuning range, but making
the range arbitrarily small is dangerous because small process variations might then
cause the VCO range to shift away from including the target center frequency. For
this reason, the VCO was designed with a range large enough to accommodate a
10 percent frequency shift in either direction and still include the 1.6 GHz target
frequency.
VCO simulations showed that obtaining this range and center frequency requires
a 6.1 nH inductor. The inductor was designed and verified with an external sim-
ulator and incorporated into the circuit simulations through an equivalent circuit
model. This tool, developed at Berkeley and abbreviated ASITIC for "Analysis and
Simulation of Spiral Inductors and Transformers for ICs", is optimized for simulating
integrated inductor structures [12]. The tool provides an equivalent "11 Model" of the
spiral structure, which has the form shown in Figure 2-5. The VCO used a standard
square spiral inductor with six turns in M6; the II model values generated by ASITIC
for this structure are listed it Table 2.2.
A graph of the resulting VCO gain is shown in Figure 2-6. The total range is
approximately 1.4 GHz to 1.8 GHz, with the highest gain in the control voltage range
28
Figure 2-4: VCO core and buffer circuits.
C1
L
C2
R2 R3
Figure 2-5: ASITIC II model.
29
C3- l
L C1 C2 C3 I RS R2 R3 6.12nH 25fF 125fF 125fF 11.5 | 5.0Q 5.0Q
Table 2.1: VCO inductor I model values.
0.2 0.4 0.6 0.8 1VCO Control Voltage (V)
1.2 1.4 1.6
Figure 2-6: VCO gain.
of 0 V to 0.9 V. The 1.6 GHz center frequency is generated by a control voltage of
about 0.4 V, near the center of the range.
As the divider is fully differential to minimize jitter generation, the VCO should
have symmetric differential full-swing outputs. The VCO core produces a differential
output with a common-mode around 0.4 V. Therefore, the first two stages of the
symmetric buffers shift the DC level of the signal to near the half-rail voltage of
0.9 V. This signal drives an inverter with resistive feedback to prevent saturation
in order to generate a full-swing signal, which may then be used to drive standard
inverters and logic. The VCO core waveform, first stage buffer waveform, and final
output are shown in Figure 2-7. This LC VCO provides a low-jitter local reference
which, when locked to the optical reference signal by a PLL, can be used to generate
a high-precision local clock.
30
VCO Voltage-to-Frequency GainI.e
1.
ii11.7
1.
1.4
1.
.8
15
.6.......... ..........
55 ...... .. ........ .. .. ..........i5
. ' : :
1.8o
VCO Core and Buffer Output Waveforms
Tirme (ns)
Figure 2-7: VCO output for control voltage of 0.4 V.
31
00
32
· _
Chapter 3
Optoelectronics
Monolithic integration of optics and electronics is one of the primary challenges in
any optical clocking scheme. The complete system requires integrated waveguides,
integrated photodiodes, and standard CMOS logic. Clearly, the ideal solution would
meet all the specifications and be completely fabricated in a standard CMOS process.
Failing that, then the remaining functionality should at least be obtained through a
CMOS compatible post-process. It is feasible to consider the post-process approach
for a real application where optical clocking was required, but the prototype is lim-
ited to strictly CMOS design. Therefore, this chapter will consider both cases and
attempt to provide an accurate review of the performance attainable given both sets
of constraints.
3.1 Photodiode Background
This chapter includes a brief review of the several simple concepts and metrics that
must be understood before embarking on an analysis of various photodiode struc-
tures. Fundamentally, any diode becomes a photodiode and produces current when
illuminated. Incoming photons generate electron-hole pairs in the semiconductor and
some of these electrical carriers drift or diffuse across the diode junction before re-
combining, thereby generating electrical current. The speed and efficiency at which
this process occurs is a function of the material properties and the geometric features
33
of the photodiode.
A depletion region, with depth varying as a function of doping concentrations and
reverse bias voltage, is formed at any PN junction. The carriers generated in the
depleted region appear at the terminals fastest because they drift to the appropriate
terminal. Those generated within the P or N regions may recombine or diffuse to the
junction and then drift the remaining distance. Drift is much faster than diffusion,
so in order to obtain a photodiode with a fast response, it is preferable to generate
the majority of the carriers in the depleted region. On way to achieve this result
is to put a very lightly doped "intrinsic" region between the P and N regions. In a
well-designed case, this low doping allows the applied reverse bias to fully deplete the
entire intrinsic region and all the carriers generated in that region will drift to the
appropriate terminals. These structures and doping levels are not, however, available
in a standard CMOS process and the diodes obtained through this type of process do
not obtain the best performance achievable in silicon.
In a fully depleted PIN diode where the P and N regions are masked so that no
carriers are generated there, the speed of the diode is no longer limited by diffusion
transit time so the effect of drift transit time becomes significant. In this case, the
speed of the diode becomes a function of the intrinsic region width. These custom
process diodes have historically employed a vertical PIN structure; the layers are
stacked one on top of the other with the intrinsic region sandwiched between the P and
N. This structure presents a tradeoff between efficiency and speed. The concentration
of remaining photons decreases exponentially with distance from the surface of the
semiconductor as they are absorbed and converted into electron-hole pairs and each
semiconductor has a different absorption depth. Materials in which the photons
are absorbed very near the surface can obtain higher performance because better
efficiency is obtained for a given intrinsic width, leading to lower transit time for
a given efficiency requirement. For materials such as silicon, which are relatively
inefficient at absorbing photons, a deeper intrinsic region is required to absorb the
majority of the photons and obtain reasonable efficiency. Unfortunately, a deeper
intrinsic region will also increase transit time for the carriers generated farthest from
34
their terminals and therefore slow diode performance. This tradeoff is present for
all materials, but those with shorter absorption lengths are able to achieve better
performance.
Previous discussion considers vertical PIN diodes fabricated in customized pro-
cesses. In a standard CMOS process, lateral partially-depleted PIN diodes are fabri-
cated by using existing PN junctions such as the P+/NWELL. If we consider a lateral
PIN CMOS diode structure and assume that a certain fixed amount of intrinsic area
is required and that the minimum dimension of the P and N regions is fixed, then
the total junction area is inversely proportional to the intrinsic region width. In a
standard CMOS lateral PIN, the doping levels are not optimized for photodiodes and
the intrinsic region will not be fully depleted. Since carriers must therefore diffuse
to the terminals, the width must be very limited to obtain reasonable performance.
As the width is decreased, however, the total junction area will increase and cause
an increase in junction capacitance. The diode performance will therefore be ca-
pacitance limited for small intrinsic region width and transit time limited for large
intrinsic region width. An analysis of the performance obtainable in a standard 0.18
/im processes will be presented in this chapter.
3.2 Standard CMOS Silicon Photodiodes
3.2.1 Possible Diode Structures in Standard CMOS
Because this design uses a standard mixed-signal 0.18 m CMOS process without
special photodiode process features, the photodiodes must be created from the ex-
isting PN junctions: P substrate, NWELL, DNWELL, PWELL, N+, and P+. The
DNWELL and PWELL are available only in the RF process, not the standard digital
0.18 m process, and are included for completeness but should be avoided in the
design if possible in order to demonstrate the achievable performance in standard
CMOS. The P substrate must always be grounded, so substrate diodes may not be
used in the stacked diode phase detector and are not considered here.
35
Though non-substrate diodes may be connected to arbitrary potentials, the para-
sitic diodes of each structure contribute differently to the output for different po-
tentials. Figure 3-1 shows the physical cross section for both NWELL/P+ and
DNWELL/PWELL/P+ diodes, and the schematic representation of the intentional
and parasitic diodes for connections from VDD to V_ and V. to GND. To obtain a
larger diode, multiple N+/P+ finger pairs would be added in the same well to in-
crease the total intrinsic area without making the intrinsic width too large. Because
the typical depth of an NWELL is on the order of 1-2 lm, much less than the absorp-
tion depth of silicon, a large quantity of carriers will be generated in the substrate
in addition to those generated in the well, and the parasitic diodes may easily pro-
duce more current than the intentional diodes. Furthermore, these parasitic diodes
will have slow tails in their responses caused by long diffusion lengths from the deep
substrate. For cases A and B, the NWELL/P+ junction forms the intentional diode
while the NWELL/P-SUB diode is the parasitic diode. In A, the parasitic diode is
connected from VDD to GND and the current is not seen at the output node. In B,
however, the parasitic diode is connected from the output to ground and the currents
add in parallel. Therefore, in the diode stack configuration, the bottom diode current
would be much larger than the top diode current for equal illumination. Cases C and
D, using the RF process DNWELL and PWELL, exhibit similar problems. In C, the
intended N+/PWELL pull-up diode may actually be smaller than the DNWELL/P-
SUB parasitic pull-down diode, leading to a net pull-down effect. In D, the parasitic
diodes are shorted together to GND and eliminated.
Using the mismatched diodes of A and B in the proposed phase detector would
result in significant phase offset from quadrature proportional to the difference in
current. If the top and bottom diodes were consistently mismatched across the chip,
then the DC mismatch itself, temporarily neglecting transit time and capacitance
concerns, might present only a minor problem. The use of diodes C and D, however,
would likely result in complete failure of the PLL if both diodes presented a net
pull-down current and the loop had no way to gain voltage. Given this analysis and
the more universal availability of standard CMOS fabrication, diodes A and B were
36
chosen for the design and further analysis will be based on these two structures.
3.2.2 Photodiode Junction Capacitance
Capacitance, transit time, and DC current output are the three major design criteria
for integrated photodiodes. In the case of a TIA, the system will be directly limited by
the RC bandwidth of the photodiode capacitance and the feedback resistance. Since
the gain required to achieve a given output swing is inversely proportional to the input
current, the metric of ~- is typically used to assess photodiode performance. TheCPD
proposed phase detector is not constrained by the traditional RC bandwidth limit,
but the same ratio metric is still valid for reasons relating to loop stability which will
be discussed in Chapter 5. Therefore, characterization of the diode capacitance is
critical.
This photodiode is simply the illuminated version of a P+/NWELL diode. There-
fore, capacitance simulations are available in a standard design flow. In addition to
simply determining the capacitance of the proposed diode structure, the capacitance
as a function of reverse bias can be used to determine the depletion layer width as a
function of reverse bias, since doping concentrations are rarely available to designers.
Figure 3-2 shows the capacitance of a 35 m square diode as a function of reverse
bias voltage. A large square diode is used for this test, as opposed to a fingered diode,
in order to guarantee that the sidewall capacitance is an insignificant portion of the
overall capacitance and therefore make the depletion width numbers more accurate.
As expected, the photodiode exhibits a capacitance that decreases with reverse bias
voltage.
The depletion width can be approximately determined by using the simple plate
capacitor formula: W - Asi. Using the constants e0=8.85e- 12 and esi=11.7o0, the
capacitance data is easily used to obtain the depletion width data shown in Figure 3-3.
This analysis of the depletion width versus reverse bias shows that, even if the
intrinsic region width is reduced to the 0.23 /lm minimum allowed by the design
rules, the intrinsic region will not be fully depleted at the 0.9 V reverse bias expected
at steady-state in the stacked diode phase detector topology. Therefore, the transit
37
.F-1 I Vdd Ivaa VX
N MWELL ANWELL V~
/ P-Sub Vx
A
BVdd
I17 3A VI a
VoVx
C
5OVx
Figure 3-1: Possible CMOS diode structures.
38
Vx _
NWELL ~ u
DD
Reverse Bias (V)
Figure 3-2: Junction capacitance versus reverse bias.
Reverse Bias (V)
39
Figure 3-3: Depletion width versus reverse bias.
time cannot be neglected as in a fully depleted PIN, and the performance of the diode
will be determined by the combined effects of the capacitance and the transit time
through the undepleted regions.
The junction capacitance of the photodiode structure is proportional to the total
P+/NWELL junction area. Assuming that the P+ and N+ implants will be fixed
at the minimum width and that a fixed intrinsic area is required to produce the
required current, then the total junction area will be inversely proportional to the
intrinsic region width. As the intrinsic width is increased, more intrinsic area is
enclosed between each fixed size P+/N+ finger pair. Based on measurements of the
P+/NWELL diodes previously fabricated in the same process, it was determined that
1250 /m 2 of intrinsic area will be needed to obtain the required 10 JIA of photocurrent
with reasonable power [13]. A capacitance/area value derived from the 0.9 V bias
point of Figure 3-2 was then used to determine the total capacitance of a fingered diode
structure with this fixed intrinsic area as the intrinsic width was varied. The results of
this calculation, shown in Figure 3-4, indicate that the intrinsic region must be nearly
1 Jm wide to obtain a total photodiode capacitance as low as the 200 fF target of the
phase detector structure. In fact, even from an area efficiency perspective, intrinsic
width much lower than 1 um seems unreasonable, given that the contact N+/P+
areas on either side will total to about 0.5 pm. However, though capacitance analysis
alone would suggest that the photodiode is optimized by arbitrarily increasing the
intrinsic region width, the transit time through the undepleted region is oppositely
optimized and therefore requires that some intermediate intrinsic width be chosen for
a reasonable compromise between the two performance criteria.
3.2.3 Transit Time
The electron-hole pairs generated by incoming photons may experience two modes of
transport to their respective junction destinations. Carriers in a depletion region will
be accelerated by the relatively high electric field and transported by the rapid drift
due to that field. Carriers in an undepleted region will diffuse slowly due to carrier
gradients and will either recombine or reach a junction. Because drift is much faster
40
Junction Capacitance (O V Bias) v. Intrinsic Width for Fixed Total Intrinsic Area
0.3 0.4 0.5 0.6 0.7Intrinsic Region Width (urn)
0.8 0.9 1
Figure 3-4: Junction capacitance versus intrinsic width.
than diffusion, it is desirable to obtain a fully depleted PIN diode and generate all
the carriers in the depleted region. When this is not possible, careful consideration of
the total distance the carriers must diffuse and the resulting transit time is required.
In this situation, the electrical current produced as a result of an incident optical
square wave will appear qualitatively similar to the waveform shown in Figure 3-5.
The carriers generated in the intrinsic region will drift to the terminals very rapidly
and produce nearly a step change in photodiode current. The carriers generated in
the undepleted region will gradually diffuse to the junction and introduce a slow tail.
Clearly, in order to obtain an output current that approximates a square wave, the
transit time should be very short compared to the signal period.
The transit time is a function of distance, temperature, and carrier mobility.
Because the electrical carriers in this diode are generated in the NWELL, the mobility
of holes in the NWELL should be used for these calculations. A typical value of hole
mobility in a 0.18 Mm process is 110 cm2 /V-s [14]. The diffusion coefficient, D, may
be calculated from the Einstein relation, D=p kT, where is the hole mobility,q
41
700
600
400
300
0.20.2
. . . ....... . ..
blJ · · ·OM
I I I
- .. A..... .. .... .. .. .I............:....... ...:.. .........:............ ..... .....:I
............ :.
............ :.
........... .......................
. . . . . . . . . . . . . . . . . . . .
.... .. ....... .. . . ..... .
... .. ............ ......... ........ .. ........ .. .................. :..........
Illustration ot Effect of Transit Time
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Normalized Time
Figure 3-5: Illustration of transit time.
k is Boltzmann's constant, T is the temperature in Kelvin, and q is the electron
charge. The transit time, , is calculated according to r = and a plot of r as
a function of W is shown in Figure 3-6. The 200 MHz optical reference clock has
a period of 5 ns, so it is clear that the 1.8 ns transit time for a diode with 1 m
intrinsic width is unacceptable. However, Figure 3-4 shows that the capacitance of a
diode with sufficiently small transit time will be too large. This analysis shows that
there is not a photodiode structure available in this 0.18 Am CMOS process capable
of simultaneously meeting the defined power, current, capacitance, and transit time
requirements.
In the context of a research prototype, more photocurrent can be generated with
less photodiode area by using a well-focused high-power source. Reducing the diode
area by 50-75 percent would allow an acceptable tradeoff between capacitance and
transit time between fingers. If it were not for the deep substrate effects, this com-
promise would provide an acceptable diode for this application.
However, carrier generation in the deep substrate causes two problems that are
42
Transit Time v. Intrinsic Width
Intrinsic Width (um)
Figure 3-6: Transit time versus intrinsic width.
beyond the control of the designer: diode mismatch and transit time from the deep
substrate. Overall pull-up/pull-down diode mismatch is caused by the parasitic
NWELL/PSUB diode, which appears in parallel with the pull-down diode but is
shorted from VDD to GND for the pull-up diode. (Refer back to Figure 3-1 for illus-
trations and schematics.) Furthermore, the transit time for most of the carriers in the
parasitic diode is independent of the finger spacing. A carrier generated deep in the
substrate may need to travel up to 10 lm to the NWELL/PSUB junction, whether
the spacing of the N+/P+ fingers is 0.1 pm or 10 pm. In fact, as the spacing of these
fingers becomes smaller than the depth of the NWELL, this effect will even begin to
become apparent in NWELL/P+ diodes due to carriers generated near the vertical
center of the NWELL.
Diodes A and D from Figure 3-1 form a set of pull-up/pull-down diodes in which
all parasitic diodes are shorted between the supplies and do not contribute their long
tail currents to the output. Using these two in combination could ameliorate the
problem of transit time from the deep substrate, but the photocurrents of the two
43
completely different diode structures would not be matched. This would be the best
available solution if a test chip were to be fabricated in this RF process, but diode D
is not available in standard CMOS. To first order, the mismatch of the two diodes will
simply produce a systematic offset from quadrature of the generated clocks, which is
not necessarily undesirable so long as it is consistent between instances. Chapter 5
presents a more complete analysis of the effects of diode characteristics on overall
PLL dynamics with the originally proposed phase detector.
3.3 Photodiodes in SOI and Custom Processes
Although this design will consider only photodiodes fabricated in a standard CMOS
process, understanding the future potential of optical clock distribution requires a
brief analysis of more optimized, higher performance silicon photodiodes.
3.3.1 SOI Photodiode Receivers
Significant effort has recently been dedicated to finding ways to integrate high per-
formance photodiodes into a standard CMOS chip. The majority of the approaches
use a process that is somehow modified or amended to provide enhanced photodiode
functionality. Although the test chip must be designed with the diodes available in a
standard CMOS process, a production optical clocking system could introduce a few
additional process steps in order to obtain the improved performance of these types
of photodiodes.
Because of the parasitic diodes and substrate carrier generation, discussed in Sec-
tion 3.2, none of these diodes are fabricated using a standard process. However, an
SOI process offers some advantages in optoelectronics and researchers have pursued
the idea of SOI photodiodes [15]. The authors of this reference also examined high-
resistivity non-SOI as a possible candidate and found that the photodiodes in this
process required 30 V bias to achieve 1.0 Gb/s. Even with this bias, they still had a
low frequency tail response due to diffusion from deep substrate carrier generation.
Unless the photodiode is somehow isolated from the deep substrate, the electron-hole
44
pairs generated by the last photons absorbed deep in the bulk will gradually diffuse
to a junction and appear at the terminals as a long tail. This problem prompted the
authors to examine the SOI photodiodes in the referenced paper. They found that
by using a 3.0 pm silicon layer on a buried oxide they could achieve the required per-
formance, both in terms of efficiency and bandwidth, without resorting to extremely
large bias voltages. The receiver, fabricated in a 1.0 pm SOI process, achieved 1.5
Gb/s and 622 Mb/s maximum speed operation at 5 V and 3 V single supply voltage,
respectively. Later work by the same group demonstrated improved results by using
the same techniques in an unmodified 0.13 pm SOI process [16]. This work achieved
8 Gb/s receiver operation with the photodiode biased at 24 V.
3.3.2 CMOS-Compatible Custom Photodiode Processes
Freedom to add custom process steps allows optimization for higher performance
photodiode topologies. IBM has focused a major research effort on developing lateral
trench detectors in silicon. By etching deep trenches in the silicon and filling them
with N-type and P-type polysilicon, they are able to decouple the transit distance
and absorption depth in order to obtain both high speed and high responsivity [17]
[18]. The trenches extend many microns into the substrate but are placed relatively
close together, so carriers generated in the intrinsic region many microns below the
silicon surface are still rapidly collected by the nearby terminals. The photodiodes
created through this process exhibited 6-dB bandwidth of 1.5 GHz at 3.0 V single
supply voltage and quantum efficiency of 68 percent at 845 nm.
Even for 8 pm deep trenches, some carriers are still generated in the substrate
beyond the trenches due to the 15-20 m absorption length of silicon at 850 nm
and generate long tails in the photodiode response [19]. Therefore, this extension of
the previous work explored the idea of using deep trench detectors in an epitaxial
layer of opposite type from the substrate, thereby isolating the carriers generated
below that junction and improving response time. The work reports that the use of a
junction substrate lateral trench detector can improve the bandwidth to 6 GHz from
the 100 MHz obtained with a bulk lateral trench detector in the same process.
45
There are many possible structures besides the lateral trench PIN, though it ap-
pears to be one of the most promising ones reported in recent literature. Researchers
are also in the early stages of investigating the possibility of using materials other
than silicon to create even higher performance diodes and then using self-assembly
to place these diodes into recesses left in the silicon wafer, though the results are not
yet published.
Some combination of these techniques will eventually produce high-performance,
CMOS-compatible photodiodes for use in future optical clocking systems. Therefore,
while the analysis of attainable performance of photodiodes in CMOS will reflect the
true attainable results, parts of the phase detector and PLL analysis will assume
higher performance to demonstrate the feasibility of the concepts in future technolo-
gies.
3.4 Waveguides
Integrated waveguides may be fabricated in a dedicated CMOS-compatible post-
process. These waveguides are created by fabricating a core/cladding structure that
operates on the same principal as a multimode optical fiber. The difference in re-
fraction index causes the light to remain contained in the core and proceed through
the waveguide. Some materials that have so far been considered are SiON or SiONy
for the core and SiO2 for the cladding. A 49:51 worst-case split power mismatch has
been achieved by using these materials and shaping the split points to minimize loss
at these junctions and improve matching [20]. This reference also describes a method
for integrating photodiodes in a way that will evanescently couple to the waveguides.
Though it is possible to fabricate diodes in a standard CMOS process, these diodes
will not likely be easily coupled to the waveguides. Therefore, in the case where the
wafer will already be post-processed to add the waveguides it is logical to include a
few extra steps to integrate higher performance diodes that will couple directly to the
waveguides. Since waveguides cannot be fabricated in a standard CMOS process, this
design relies on free space optics and the silicon photodiodes available in a standard
46
CMOS process.
3.5 Conclusions
Obtaining acceptable performance from CMOS diodes is extremely challenging for
most applications. Even if a high-power source is used and the intrinsic area reduced
in order to find an acceptable optimum between transit time and capacitance limi-
tation, the deep substrate effects are significant. In a standard CMOS process, it is
not possible to generate both pull-up and pull-down diodes unaffected by the slow
current tails of the parasitic diodes.
In the future, when optical clocking becomes the only practical way to deliver high
precision timing references, processes will be modified to include higher performance
diodes. Intel, in their analysis of the feasibility of optical clocking, assumed the
availability of photodiodes producing 100 IA with only 5 fF capacitance. So, although
actual prototype designs in standard CMOS may be photodiode limited, the analysis
and design of the optical PLL should instead consider the circuits and systems that
will become possible when photodiode performance improves. Therefore, the PLL
dynamics analysis will assume the availability of 200 fF photodiodes with transit
time much less than the period, an assumption that is not unreasonable given recent
progress in the field.
47
48
Chapter 4
Analysis of Phase Detectors
A PLL-based optical clock distribution system with an optical-electrical small-signal
phase detector has the potential to generate low-jitter, low-skew local clocks. Assum-
ing that the phase detector is implemented in a way that does not introduce excessive
ripple to the loop filter, the jitter of the overall system is primarily determined by the
VCO, which may be minimized by using a LC VCO or a low-jitter, self-biased ring
oscillator [101. This decoupling of the jitter from the optoelectronic conversion stage
potentially provides a significant advantage over a TIA system that may introduce
large jitter at this interface. However, like the TIA system, the steady-state offset of
the output clocks from the optical input signal is non-zero. In a traditional receiver,
this offset would be contributed by the TIA and limit-amplifier delay. In this case, it
is determined by the small-signal characteristics of the phase detector. If the sources
of this difference from the ideal case are independent of process and supply then all
instances of the PLL will experience the same offset and no skew will result. It there-
fore becomes important to characterize the source of the offsets accurately for each
phase detector topology considered in order to determine the impact.
This chapter analyzes the proposed current-steering phase detector in detail and
discusses basic operation, sources of phase offset, and silicon optoelectronics consid-
erations. The analysis is extended to include suggestions of other topologies with
different advantages and disadvantages.
49
4.1 Current-Steering Phase Detector
4.1.1 Basic Current-Steering Topology and Operation
The basic topology for the proposed current-steering phase detector is shown in Fig-
ure 4-1. The circuit provides the functionality of both a phase detector and a charge
pump by using the electrical feedback clock from the divided VCO output (EC) to
steer the current generated in the photodiode by the optical input (OC) on and off of
the loop filter. The circuit is similar to the charge-pump structures used in previous
works [2], but in this case the current sources are replaced with photodiodes and
controlled with optical input signals. The photodiodes are illuminated with the same
fifty-percent duty cycle optical reference clock, OC. We assume for the moment that
the loop filter is simply a capacitor. Although the actual filter will be more complex
in order to stabilize the PLL, this assumption simplifies visualization of the phase
detector operation and the intuition gained from this exercise is directly applicable to
the more complex filters. When the electrical clock generated by the feedback divider
(EC) is high, the current from the upper photodiode flows into the loop filter and
increases the output voltage, while the unity gain feedback buffer absorbs the current
from the lower photodiode. When the electrical clock goes low, the switch settings
are reversed and current flows out of the loop filter and decreases the output voltage.
Since charge is the integral of current, if we define P to be the percent of the optical
clock (OC) high period for which the electrical clock (EC) is also high, then the net
charge on the loop filter after a cycle is given by Q=Icp () (P - (1 - P)). It fol-
lows that the net change in loop filter voltage over one cycle is zero when the optical
and electrical signals are locked in quadrature and P=0.5. The feedback amplifier is
required to prevent the parasitic capacitance of the photodiodes from simply storing
the charge that should be steered away from the loop filter and delivering it through
charge sharing when the switches transition.
In PLL analysis, the phase detector is characterized by the transfer function from
phase difference to average current. This model is a linearization of the actual phase
detector characteristics and is valid only when the loop has pulled the oscillator into
50
EC
OC
ICP
VCP
TIME
Figure 4-1: Basic current-steering phase detector topology and operation.
the small-signal locking range, but this type of linearization is necessary for loop
dynamics analysis. The phase-current transfer function for this topology is shown in
Figure 4-2. The relationship between net charge per cycle and relative timing of EC
with respect to OC has just been established and the average current is simply ,
so it follows that IAVG = IpD () (P - (1 - P)). We define the two signals to have
zero phase error at quadrature. Therefore, the average current is zero when there
is no phase difference and reaches its maximum of IP when the phase difference is
X. This analysis again assumes that the optical clock has fifty percent duty cycle.
Reducing or increasing the duty cycle of the optical clock will result in a maximum
phase detector gain of DIPD.
4.1.2 Sources of Phase Offset
The conceptual phase detector analysis implicitly assumes the availability of ideal
switches, photodiodes, and amplifiers by representing the circuit of 4-3.1 with the
model of 4-3.2. This simplification is appropriate and necessary for loop dynamics
51
I
IIi
I
I
: 1. L i jI II 1 1
I I ~~~~~~~~~~~IIl/l\i l lm
l I i---------- W_
I
I
I
::
A.: \L
A
Average Current
Figure 4-2: Phase difference versus average current transfer function of current-steering phase detector.
modeling, as it provides a good first-order model of the phase detector, which can be
described mathematically and used in LTI system analysis to characterize loop sta-
bility and damping. A closed-form mathematical description including nonidealities
would be prohibitively complicated, as many of the nonidealities are nonlinear with
respect to output voltage as well as phase difference. Furthermore, such a model is
unnecessary as the mathematical stability analysis is correct to first-order with the
simpler model and all higher-order effects are verified in circuit level simulations.
Nevertheless, it is important to understand the qualitative effect that each signifi-
cant nonideality will have on overall circuit performance. Simulations in the following
sections show that amplifier gain error and switch resistance collectively account for
the vast majority of second-order effects present in the phase detector structure. A
brief examination of 4-3.1 reveals the origin of each contribution. The feedback am-
plifier is intended to prevent unwanted charge sharing by holding each photodiode
parasitic capacitance at the output voltage when the electrical clock signal alter-
nately isolates each photodiode from the loop filter. The switches ideally provide
zero-resistance paths between the circuit components. The simplified circuit of 4-3.2
does not model the effect of deviations from these idealized assumptions. The follow-
ing sections analyze how circuit performance changes when the amplifier gain is not
52
Phase Difference-I
Figure 4-3: Simplification of phase-detector structure.
exactly unity and the switches have non-zero on-resistance.
Amplifier Gain Error
The unity gain feedback amplifier is included in the circuit so that both photodiode
parasitic capacitances will always be held at the output voltage and undesirable charge
sharing does not occur when EC changes state. If there is gain error in the feedback
amplifier, however, the photodiode capacitance will be held at a slightly different
voltage when the switch configuration isolates it from the output filter and this voltage
differential will result in charge sharing when the switches change state to short the
photodiode capacitance to the loop filter. The impact of any voltage differential
introduced by the amplifier is scaled by a factor related to the photodiode parasitic
capacitance and the loop filter capacitance. When the switch closes and the two
capacitances are shorted together, the voltage will change according to the basic
principals of charge sharing shown in 4.1, which simplifies to 4.2. These equations
assume that the damping resistance in series with the loop filter capacitor COUT is
zero because this does not alter the steady-state result of the charge sharing.
53
F
I I[~~~~~~~~~~~~~~~~~~~~~~~~~~~ !
-
VOUT + AVOUT = CP(VOUT + AVAMP) + COUTVOUT (4.1VOUT + /\VOUT (4.1)
Cp + COUT
AVoUT = CPAVAMP (4.2)CP + COUT
4.2 clearly shows that as the parasitic capacitance approaches zero, the output
voltage is not affected by gain error because even large voltage differences on rela-
tively small capacitors will contribute very little charge. Conversely, as the parasitic
capacitance approaches infinity, any gain error will appear directly at the output
node. In a realistic implementation, the parasitic capacitance might be twenty-five
percent of the output capacitance and the influence of the gain offset would be scaled
accordingly. The choice of loop filter component sizing with respect to photodiode
capacitance for this design, discussed in Chapter 5, is consistent with this general
rule.
The steady-state gain error introduced by the amplifier will be identical for both
the upper and lower diode parasitic capacitances, but the amplifier may also have
different up and down slew rates. Each parasitic capacitance is shorted to the output
once per cycle. One parasitic capacitance is shorted to the output when EC changes
state while OC is high and the loop filter voltage is ramping. The other capacitance
is shorted to the output when EC returns to the starting state while OC is low and
the loop filter voltage is stable. Therefore, when the respective capacitors share their
charge with the output node, one will be set to the steady-state offset voltage and
the other will be set to either the up or down ramping error voltage.
The result of this charge sharing is that some fixed quantity of charge will be in-
jected onto the loop filter each cycle. If this quantity is positive, the loop filter voltage
will gradually increase if the inputs are in quadrature and the loop will therefore lock
with the electrical clock transition positioned slightly away from quadrature to allow
the loop to discharge for longer than it charges in order to obtain a steady-state loop
filter voltage. Therefore, any charge sharing due to amplifier nonidealities translates
54
directly into phase offset from quadrature and variations in the amplifier nonidealities
across temperature and process corners translate into skew.
In this implementation of the PLL, the feedback amplifier is implemented with
a simple open-loop unity-gain buffer. Replacing this circuit with a very high gain
amplifier configured in unity-gain feedback could nearly eliminate the steady-state
errors. However, the slew rate problem would not be eliminated, new stability con-
cerns would be introduced, and any variation of the feedback resistors across process
or temperature would still introduce skew between instances. This option is therefore
not obviously superior to the existing open-loop, unity-gain buffer.
Switch On-Resistance Error
Non-zero on-resistance of the CMOS switches also contributes to phase offset from
quadrature. When the optical signal is off and the feedback amplifier is holding one
of the photodiode parasitic capacitances at the steady-state output voltage, there
is no current through the switch connecting the two and the voltages are therefore
equal regardless of switch resistance. When the optical signal is on and the voltages
are ramping, however, the switches must carry the full photodiode current. With an
on-resistance of 1 kM, this will generate a voltage difference of 10 mV in addition to
the amplifier ramping error. 1 k is approximately the resistance of a transmission
gate with a 1 Am NMOS and 3 Am PMOS, both with minimum channel length, in the
current process and at the biases expected for steady-state operation of this circuit.
Increasing the size of these switches will reduce the resistance, but this approach is
limited by the drive capability of the feedback divider. Introducing buffering stages
at the divider output also potentially introduces skew, so the advantage of increasing
switch size to the point where buffering is required is unclear.
A brief consideration of the direction of current flows through the switches reveals
that, when OC is high and the voltages are ramping, the upper parasitic capacitance
voltage will be above the amplifier output voltage and the lower parasitic capacitance
will be below the amplifier output voltage. We will temporarily assume that the
amplifier itself has perfect unity gain in order to simplify this discussion because, in
55
any case, the errors contributed by the two sources may simply be summed to obtain
the total error. In this case, if the output voltage ramps up and then down in steady
state, the lower parasitic capacitance will be shorted to the output when its voltage
is below the output voltage and the upper parasitic capacitance will be shorted to
the output when the two voltages are equal. This will cause a net downward ramp
in output voltage if the electrical and optical signals are in perfect quadrature. The
reverse case, when the voltage ramps down and then up, clearly causes a net upward
ramp for two signals in quadrature. As with the error due to the amplifier gain error,
the effect of switch on-resistance is proportional to the capacitance ratios as described
by 4.2.
Constructive and Destructive Summing of Errors
The sign of the offset due to switch on-resistance is dependent on whether EC and
OC are positioned such that the output voltage ramps in an up-down or down-up
order. The offset due to the amplifier gain error, however, has the same sign for both
cases. Therefore, the two will sum constructively for one charge-discharge order and
destructively for the other. The amplifier used in this phase detector implementation
exhibits a small positive gain error at the locked steady-state voltage. Therefore, if
the voltage ramps up and then down and the error introduced by the on-resistance
is negative, the two errors add destructively and could cancel each other if properly
ratioed. On the other hand, if the voltage ramps down and then up and the error
introduced by the on-resistance is positive, the two will add constructively to create
a faster upward ramp for quadrature signals.
In order to maintain generality, both cases of ramping order have been analyzed
and their differences compared. In fact, however, for each PLL topology only one
order is stable while the other is metastable. In this case, the VCO gain is negative
since increasing the control voltage lowers the output frequency. If the loop is locked
in quadrature with the voltage ramping up and then down and some factor slightly
slows the electrical feedback clock, the electrical signal in the next cycle will arrive
later in the on-period of the optical signal and allow it to ramp up longer than down,
56
thereby further increasing the control voltage and slowing the output clock. This is
positive feedback and therefore unstable. In contrast, if the PLL began in a locked
state ramping down and then up, the same slowing of the electrical clock would result
in negative feedback to speed it up again. The loop therefore locks to the down-up
charging order and the offsets add instead of canceling, but this is not necessarily
problematic since skew due to offset-variation, and not steady-state offset itself, is
the primary concern.
Matching to Idealized Model
In order to verify that these two sources account for the significant majority of the
phase offset from quadrature, simulations of the extracted phase detector containing
MOSFET switches and a real amplifier were compared to simulations of an idealized
phase detector containing perfect switches with a specified on-resistance and a voltage-
controlled voltage source with a non-unity gain. Figure 4-4 shows the results from
these simulations.
The solid line with no slow ramp shows the output of an "ideal" phase detector
with a perfect unity gain feedback amplifier and zero-resistance switches. The two
other solid lines show the TT/27 °C outputs of the real phase detector and an idealized
phase detector with an amplifier gain of 1.014 and on-resistance of 1.16 k. The close
matching of these results clearly indicates that these two factors together account for
nearly all phase error. The dashed lines show the output of the real phase detector
over a variety of temperature and process corner conditions.
The rate of the ramp for quadrature inputs is proportional to the phase difference
required to maintain zero net change in the output during lock. The ramp rate vari-
ation evident in these results indicates that the presence of process and temperature
variations will introduce skew between instances of the PLL on distant parts of the
chip. Simulations of SS, FF, and TT corners resulted in the skew shown in Figure 4-5.
The 200 MHz divider feedback clock variation, instead of the output clock variation,
is shown because this is the signal that directly locks to the optical signal and because
the skew may be seen much more clearly on this time scale.
57
0.5 1 1.5 2 2.5Time (s)
3 3.5 4 4.5 i
x 10
Figure 4-4: Ideal output, actual output at TT/27 C and matched idealized output(solid lines), and actual output over FF/SS/100 °C (dashed lines).
0L
E00
eCo.20*0
aa
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
-v.2 4 6
Time (ns)8 10 12
Figure 4-5: Skew for locked PLL across SS, FF and TT process corners.
58
............... ... .......... ................. .... .:'''5 ~ : , -Y '
_ ... ........ . ....... . ............. ..... .
. ... . ... ............... ... . . . ..r <~~~f::
................. :........ ........ ·.. ... ... .
. . . . . . . .
1.
1
0.98
a 0.96
C. 0.94
0.92
0.90.9
n RR
Skew Across Process Comers (TT, SS, FF)
...... ........,...". ..........I i
.. ...... ...... .. ...... .........Ii ! I 'i1 I ,I ; I
....... 1 :. ........... .,l ..............
.. ........ i. .. .... .....
II : : .; 1:: :. I. : :::::::I I ',t; l I : I
II iI
-U iI
M.rz
' ' ' ' ' ' ' '
.vG
$
a
a0.
e
gEI-e
8
*0la
C)
Skew Across Temperature (27 C and 100 C)
Time (ns)
Figure 4-6: Skew for locked PLL at 27 C and 100 C.
These simulation used the full extracted PLL with parasitic capacitance at 27 C,
a loop filter of 800 pF and 44 k, and modeled the photodiodes as 10 uA pulses
with 200 fF parasitic capacitance. The skew between the SS/FF corners due to the
combination of amplifier gain error and switch resistance is 450 ps, with the TT corner
roughly centered between the two. Although a single chip would not likely contain
both process extremes, this range is large enough that the skew across opposite ends
of the chip might easily exceed 10 percent of the 625 ps period.
Figure 4-6 shows the skew generated when temperature is varied from 27 C to
100 C. This condition is more likely to occur within a chip and, as shown in the
figure, produces 114 ps of skew between the two divider feedback clocks.
The results just presented include the effects of process and temperature varia-
tion on the switch resistance and the amplifier gain error. It is also important to
understand what percentage of this total skew is introduced by each of these two
sources. Therefore, another set of simulations was completed to isolate the effect of
switch resistance and amplifier gain error variation across the SS and FF corners. In
59
: I I :
I PEI.,' " '. '. . ...... . .
. . . . ... . . . ..- · · · · I. · ·. . . . . . . . . . . .
.... ............ , ........... ................; ......... ... .. 1 ....... ......./... M...............
i... ................. I............... ...
4I : I :
.................... ..................
4 610 12
l.-
order to obtain these results, the phase detector was removed from the layout and
the new layout was extracted with parasitic capacitance and pins for attaching an
external schematic-view phase detector. Two versions of this external phase detector
were created. One phase detector was implemented with CMOS switches but the
feedback amplifier was replaced with a voltage-controlled voltage source with perfect
unity gain. Another phase detector was implemented with the real amplifier but the
CMOS switches were replaced with ideal switches with 1 kQ resistance. Simulation
convergence issues prevented the use of switches with zero resistance, but the use of
switches with 1 kQ resistance is completely acceptable for this test because the resis-
tance is not process dependent and will therefore introduce only steady-state phase
offset and not skew.
Figure 4-7 shows the results of the corner simulations with an ideal amplifier and
real CMOS switches. Figure 4-8 shows the results of the same corner simulations with
the real amplifier and ideal 1 k switches. This result clearly shows that the switch
on-resistance is responsible for nearly all of the skew generated across process corners.
The data presented in these figures shows that the skew is 447 ps across SS/FF with
an ideal amplifier and real switches but only 38 ps with a real amplifier and ideal
switches. A more process-independent amplifier could be designed to reduce the
38 ps skew introduced by that circuit, but further analysis in this work will focus on
the effect of reducing skew by decreasing switch on-resistance. The sum of these two
skew components is not identical to the results obtained for the complete extracted
phase detector, but the difference is relatively small and can be explained by the fact
that the phase detector parasitic capacitance was not extracted for the later set of
simulations.
As discussed earlier in the chapter, increasing the width of the switches proportion-
ally decreases the switch on-resistance. This method cannot be extended indefinitely
as the CMOS switch parasitic capacitances will eventually become large enough that
the feedback divider outputs require significant buffering and the parasitic capacitance
also begins to interfere with the loop filter and PLL dynamics. However, moderate
increases in the switch size are possible. In order to determine the skew reduction at-
60
Skew Across Corners SSIFF: Real Switches-Ideal Amplifier
00E
a00
8U
IM
8cm
6Time (ns)
8 10 12
Figure 4-7: Skew over SS/FF corners with ideal amplifier and real CMOS switches(solid lines) and optical reference (dashed line).
E0
U0
0
a2C.)0
IN
00cmJ
Time (ns)
Figure 4-8: Skew over SS/FF corners with real amplifier and ideal switches (solidlines) and optical reference (dashed line).
61
· i ' ' ' iI :
.. I
··i ''`i
--· · I·-·I···I-··I
. �I��··
.. l....i.__
Skew Across Comers SS/FF: Ideal Switches-Real Amplifier
1.8
1.6
140 1.20
U 0.8
'I-U.0
0cmJ
Skew Across Corers SS/FF: Triple-Size Real Switches-Ideal Amplifier
. . . . . ........ ....... .. ......
2 4 6 8 1CTime (ns)
Figure 4-9: Skew over SS/FF corners with ideal amplifier and triple-sized CMOSswitches (solid lines) and optical reference (dashed line).
tainable by this method, the corner simulations were repeated with an ideal feedback
amplifier and transmission gates three times larger than those used in the original
phase detector. The results of this simulation are shown in Figure 4-9 and the skew
between the SS/FF corners is 275 ps. This is clearly an improvement from the earlier
result, but the improvement is not proportional to the increase in switch width. This
could be partially due to increases in parasitic capacitances and the resulting increase
in the feedback clock rise time, an effect which could be reduced by buffering the di-
vider feedback clock, but the buffers themselves would increase skew so this potential
solution should be approached with caution. In either case, it is unlikely that the
switch size can practically be increased enough to reduce skew to reasonable levels in
this process.
Table 4.1 shows a summary of the skew for each of the above simulation pairs.
These results do not compare favorably with the 26 ps skew obtained in [5] by pas-
sive fuse-based electrical deskew methods at 1.5 GHz, but the disparity may not be
62
I I - , __ - - I
1 .. .... :
. . -. ... -.. - - .-. . ... . ... . .-........ ........... 1 ...i
..... . ........ . . . ..' . .. .. . .. i
Simulation Description Simulated SkewTT/SS/FF corners - Real switches and real amplifier 450 ps
27 C - 100 C - Real switches and real amplifier 114 psSS/FF corners - Real switches and ideal amplifier 447 psSS/FF corners - Ideal switches and real amplifier 38 ps
SS/FF corners - Triple-size real switches and ideal amplifier 275 ps
Table 4.1: Summary of skew sources.
quite as large as it appears at first glance. The reported results use a faster process
and consider only the actual variations across the chip, not the worst-case corners.
However, even these factors cannot realistically account for the entire performance dif-
ference. In addition, these simulations modeled the photodiodes as idealized current
sources with capacitors in parallel. This model neglects transit time and photodiode
matching, which must also be analyzed.
Silicon Optics Considerations
A few of the relevant challenges of silicon integrated optics are discussed in Chapter 3.
Many of these nonidealities will also shift the steady-state lock position away from
quadrature. If, for example, the bottom diode provides twice as much current as the
top diode, then the electrical clock must divide the optical signal such that current is
added to the loop filter for 67 percent of the cycle and subtracted for 33 percent of the
cycle. Variations in transit time will also significantly affect the equal charge division
point. In order to maintain a steady-state voltage as the transit time increases, the
electrical signal must divide the optical signal farther to the right on the time axis, a
fact which may be easily observed in Figure 3-5. These effects are difficult to analyze
and, for all intents and purposes, impossible to simulate within a traditional circuit
simulation environment. Furthermore, integrated optical waveguide technology is still
an emerging field and obtaining equal power splitting is still a challenge. Splitting
ratios of 49/51 percent or better have been reported, but even such a small difference
introduces some error and even ideal photodiodes would produce different currents
due to power mismatch [20].
63
_
..
0
C
aIDa.Qi
00
'S
.0
C
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
Skew With Diode Mismatch: 10 uA Top Diode - 20 uA Bottom Diode
............. .. .... .I I
........... --i:'[ ....... ..... ..........
..... ............ .i. ..I i I_ . . . ... ..............
I .!
_.... 1. ....................
..... .... .. I ..... ... ......
.............. 1.I I_.. ....... .. .................i
........... ................ '. .
I I ' II I
2 4 6Time (ns)
8
.... I . . .... , ....... , .. ............ft .' :lI....... : .I. : ......................... .. ... ..... ......: : I : I
i \ ' I·I · · · · · · · · i ~
10 12
Figure 4-10: Phase difference for locked PLLs at TT/27 °C with 10 /A/20 A versus10 pA/10 /A differential current mismatch.
We will consider two types of photodiode mismatch. First, there is mismatch
due to the parasitic diodes, which we will call "differential current mismatch". In
this case, the top diode in a particular instance of a phase detector has a different
current than the bottom diode in the same phase detector. If two instances of the
phase detector have different differential current mismatch then clearly there will
be skew between the two generated clocks. This is a predictable, first order effect.
Second, skew will be introduced if both photocurrents in one instance of a PLL are
proportionally scaled with respect to both photocurrents in another PLL, a condition
that we will call "common-mode current mismatch". For example, in a given pair
of PLLs, both diodes in one phase detector might produce 10 A and both diodes
in the other diode might produce 12 pA. In this case, skew is generated as a result
of second-order effects, such as the feedback amplifier ramping nonidealities and the
variation of the voltage drop across the switch resistances with varying current.
Accurate determination of the differential current mismatch requires a more exact
64
I I I
: :
process description than that which is available. However, given that the NWELL
is only about 1-2 m deep while the absorption length of silicon is on the order of
10 ,Im, it is reasonable to estimate that the parasitic photodiode would produce the
same current as the intentional NWELL diode. This would result in a system in
which the top diode produced 10 HA and the bottom diode produced 20 PA. If this
ratio were constant across all instances of the PLL, the result would not be skew, only
common-mode phase offset from quadrature. Unfortunately, characterizing the prob-
able variation corners for differential mismatch is even more complex than the initial
estimate of differential mismatch magnitude, requiring knowledge of such parameters
as NWELL depth variation and NWELL and substrate doping concentrations. Since
realistic estimates of variation are not possible, the system was simulated over the
variation range of 10 pA/20 AA versus 10 pA/10 IA in order to determine the phase
offset between the ideal case and the estimated differential current mismatch case
and gain a somewhat quantitative understanding of how smaller variations around
the operating point might contribute to skew. Figure 4-10 shows the results of the
simulations and indicates that there will be 500 ps phase difference between the two
cases.
This result reports a common-mode phase difference and would be impractically
pessimistic as a worst-case skew result. However, it is difficult to determine what
might be a possible worst-case amount of skew generated by this effect. The diode
mismatch should be relatively constant across the chip, meaning that one chip would
certainly not contain both the 10 pA/10 uA and 10 pA/20 PA cases compared here.
However, as discussed in Chapter 3, the ratio might vary due to factors such as
NWELL depth mismatch, doping mismatch and other process variations. Simulations
that consider these factors are not possible within an IC design flow and predicting
the actual skew they introduce is therefore not practically possible.
Common-mode current mismatch could be produced by optical power mismatch
and variations in photodiode sensitivity due to process and temperature. Again, it
is difficult to estimate the potential variation introduced by these sources so an arbi-
trary mismatch percentage was simulated. Figure 4-11 shows the skew generated by
65
Both Diodes with 10 uA v Both Diodes with 12 uA
EP
's.2c
C0E
20
UC.0IPL
C!
4 6Time (ns)
8 10 12
Figure 4-11: Skew for locked PLLs at TT/27 C with 10 A versus 12 ADA common-mode current mismatch.
common-mode mismatch between two PLLs in which one has two 10 A photodiodes
and the other has two 12 A photodiodes. It is reasonable to estimate that two sets of
diodes on opposite corners of a chip might have such a worst-case current difference.
Both simulations were conducted at 27 C in the TT corner and the resulting skew
was 34 ps. As expected, the skew introduced by this second-order effect is relatively
insignificant compared with the skew potentially introduced by the first-order effect
of differential current mismatch.
Because characterization of the variations of optical components was not possible,
this analysis of phase offset and skew is based on highly speculative estimates of po-
tential levels of optical current mismatch. For this reason, these results are presented
only for their value in approximate determination of the amount of skew potentially
introduced by a given quantity of current mismatch. They are not based on actual
knowledge of the effect of corners and temperature on photodiodes and should not
be interpreted as the predicted skew for this particular PLL implementation.
66
"L
·~~ l r
!
I I
:..... t
:...
I
.. : . ....
.......... .:.....
............ :......
....... ... :......
... ..... .. .......
j ·
.. .. .~ :. . . . .
. ...... : '` ~' ' '' '
. .. ... i... ........ ' ' `
... :
... :
....
.... ;
.... 1: I
.. . . . . . . . . . .:. . . . . . . . . . ; . . - . . . .
.... .... I ... :... ...... I ...... .. ...
............ :......... 'y; ;~~~~~~~~~~~~~~~~~~~~~~~i
........ .....
ZZZ
4.2 Extensions of Current Steering Topology
The sources of skew in the phase detector can be divided into the two main cate-
gories of photodiode mismatch and variation in the switches and feedback amplifier.
Choosing a topology that requires only one photodiode could mitigate the first of
these problems, but such topologies may introduce other sources of skew.
4.2.1 Current Mirrors
Using current mirrors is one obvious idea that comes to mind when attempting to
make a similar phase detector using only one photodiode. The topology shown in Fig-
ure 4-12 appears, at first glance, to provide a solution whereby the single photocurrent
is perfectly copied thereby eliminating the problem of diode mismatch. However, the
mirroring is not symmetrical since the UP current is mirrored once and the DOWN
current twice, which introduces a delay between the two currents. Furthermore, both
currents vary with output voltage. The voltage dependence could be somewhat im-
proved by using cascode current sources, at the expense of decreased headroom for
the remainder of the phase detector. However, when the current begins to pulse on
and off during normal operation, other problems become evident. The original phase
detector used a certain amount of the parasitic capacitance as a necessary part of the
loop filter and the photocurrent was divided between the loop filter and the parasitic
capacitance according to the capacitance ratio. In the mirrored case, however, the
current through the first leg of the mirror is simply a function of the voltage on the
photodiode capacitance, which effectively creates an integrator. The mirrored cur-
rent will therefore ramp up and down in a triangle wave as the photocurrent proceeds
through the square wave pattern.
This additional integrator alone might not necessarily be a problem. The phase-
to-current conversion ratio is reduced, but adjustments to the loop filter will correct
for this change. The larger concern, however, is a possible increase in skew. The
sources of skew inherent in the phase detector itself are not eliminated by this topol-
ogy modification since this circuit still contains the same basic phase detector core.
67
Figure 4-12: Current mirror approach.
In addition, whereas previous analysis of the skew introduced by process variation
modeled the photodiodes as perfect current sources, the current mirrors add another
level of circuitry affected by process and temperature.
Therefore, while the current mirror does eliminate the problem of differential pho-
todiode mismatch, it introduces new sources of skew to a topology that already had
too many. Implementing this additional circuitry to correct for photodiode mismatch
when inherent phase detector mismatch already exceeds the specifications is mis-
guided. It is more productive to take a step back from this topology and explore the
design space for a topology with fewer inherent sources of timing mismatch.
4.2.2 Photodiode in Feedback
It is worthwhile to examine the topology shown in Figure 4-13 [21]. This circuit
partially eliminates the ramping effect that occurs in the current mirror topology.
When the photodiode is illuminated, the increased photocurrent decreases the voltage
on the connected NMOS gate and therefore increases the output voltage. This voltage
is capacitively divided and fed back to the NMOS device biasing the photodiode,
which increases the current to match the photodiode current and hold the photodiode
68
Vdd
Figure 4-13: Feedback amplifier approach.
voltage steady. For a step increase in photocurrent, the diode parasitic capacitance is
therefore able to reach a steady-state voltage much faster. This circuit, as shown here,
was originally intended as a direct current-voltage amplifier. Instead, the current from
the bias stage could be mirrored into the phase detector.
This topology reduces the ramping problem, but does not solve other problems.
The inherent phase detector skew remains and, as in the simple mirror case, the new
circuits add more potential sources of skew. In addition, the feedback configuration of
this amplifier requires analysis of damping and stability. The complete analysis shows
that it is not possible to obtain much gain from the structure without accepting a
certain amount of gain peaking [21]. Effectively, this topology minimizes the ramping
problem at the expense of introducing a stability problem while keeping the other skew
sources of the current mirror approach and core phase detector relatively constant.
This topology is preferable to the current mirror scheme if increasing speed is a
primary concern, but it does not provide any skew advantages.
69
4.3 Topologies with Alternate Phase Detector Cores
All of the topologies described in the previous section use the same basic phase detec-
tor core with various modifications. Therefore, they all retain at least the minimum
skew generated in that core due to amplifier gain error and on-resistance. Using a
high-gain op-amp in feedback would lower the gain error, at the expense of some
additional complexity and stability concerns, but the on-resistance problem is much
more fundamental. In addition, adding mirror devices may improve the photodiode
mismatch problems but, in the process, introduces more possible skew sources. Ex-
amining topologies that depart from this phase detector core structure may prove
more promising.
4.3.1 Bang-Bang Phase Detector
The topologies considered until now are all linear phase detectors. That is, the error
signal is linearly proportional to the phase difference. A bang-bang phase detector, in
contrast, simply generates a constant magnitude signal indicating that the feedback is
either early or late with respect to the reference. Bang-bang phase detectors of various
topologies are commonly used in clock and data recovery (CDR) applications.
A possible circuit implementation of an optical-electrical bang-bang phase detector
is shown in Figure 4-14. This circuit bears some resemblance to an electrical current
integrator circuit presented in [22], but there are also significant differences. Instead of
a photodiode, the original circuit employed an electrical current source with relatively
limited parasitic capacitance, so the issues of charge sharing and capacitor reset were
not as centrally important. Replacing the current source in the original circuit with
a photodiode provides the capability of steering the optical current from a single
photodiode onto two separate capacitors and comparing their voltages to determine
the relative phase of the optical and electrical clocks.
During the reset phase, both capacitors are shorted to the supply voltage and the
voltage across the terminals is reset to zero. Then, during the sample phase, each
capacitor is discharged by an amount proportional to the time that the feedback signal
70
Figure 4-14: Bang-bang phase detector approach.
is high. Ideally, the capacitor voltages would be discharged by identical amounts over
a single cycle when the optical and electrical signals were in perfect quadrature. A
series of amplifiers and latches, described in the reference and carefully designed to
minimize systematic offsets, is used to generate a full-swing signal indicating early
or late feedback signal arrival based on the determination of which capacitor was
discharged for longer.
The requirement that various nodes be reset each cycle introduces complexities
that are not present in the originally proposed phase detector. In addition, the
mismatches associated with switch on-resistance are not significantly changed. In
light of these facts, and given that this circuit gives up the linear phase detection
characteristic of the other topology, it does not provide a significant advantage.
Charge Sharing and Parasitic Node Voltage Reset
In order for the first-order model of quadrature phase lock to be applicable, the
parasitic capacitance of the photodiode must be reset to VDD when the electrical
clocks switch in the center of the optical pulse. If there were no mid-cycle voltage
reset, the circuit had perfect quadrature optical/electrical inputs, and all internal
71
R
nodes were initially precharged to VDD, then one node would begin to discharge
when the optical signal went high. After half of the optical high period, the electrical
clocks would switch. At this time, the parasitic capacitance would have some voltage
below VDD and this charge would immediately share with newly switched node and
effectively initialize it to some voltage lower than VDD. Since the signals are in perfect
quadrature, this second node will now be discharged for the same amount of time as
the first node and will therefore have a final voltage lower than the first node by
an amount proportional to the ratio of parasitic and storage capacitance. Given the
lower voltage initialization of the second node, the first node must remain higher for
longer than the second to generate equal voltage on the two outputs.
There are also asymmetries introduced because the capacitance of the photodiode
is variable with voltage. If the voltage were not reset, the voltage variable capacitance
of the photodiode would discharge over different average voltage ranges for the two
sides and therefore present different average capacitance. This would clearly lead to
a differential in the required active time of each side.
It is therefore necessary to reset the parasitic capacitance to VDD at switch time.
Because there is only one clock edge available, a pulse-based approach must be used.
A relatively large PMOS could be connected to node and driven by a short ON pulse
occurring just at the edge of the electrical feedback clock. This pulse could be created
by feeding the electrical clock into an inverter chain and then taking the XOR of the
output and the original signal, a technique commonly employed in various types of
high-performance pulse registers.
These reset techniques do not completely eliminate the phase offsets. The reset
pulse cannot occur until after the first node is isolated from the photodiode by the
electrical clock; earlier reset would contaminate the data on that storage node. There-
fore, the time required for the reset pulse will be taken from the half period in which
the second node is discharged by the photodiode. The second node will therefore
be discharged slightly less than the first for perfect quadrature inputs, driving the
PLL to lock with the electrical clock switching before the prescribed time in order to
compensate and equalize the voltages.
72
Initialization Reset Timing
After each evaluation cycle, the differential voltage on the two capacitors must be
sampled and all internal voltages must be reset to VDD. Ideally, this evaluation and
reset would occur just after the optical reference goes low. However, the optical
clock input is small-signal, not full-swing logic level, and cannot therefore be used for
timing events. Once the loop is in lock, an additional fully electrical PLL or DLL
could be used to generate an electrical clock in phase with the optical clock. Since,
however, this clock would be referenced to the electrical feedback clock, it is just as
effective to use the edge of the electrical clock that occurs during the optical low
half-cycle in steady-state. However, during the locking transient of PLL, the phase
relationship of the optical and electrical clocks is undefined. Therefore, during the
settling time, the reset signal will likely be asserted many times in the middle of the
optical high half-cycle. There is no way to avoid this situation, given that the relative
position of the optical clock is unknown during this period, but it will likely have a
significant and analytically complex effect on loop dynamics. A complete analysis of
these effects would be required prior to the implementation of a PLL with this type
of phase detector.
Other Potential Sources of Phase Offset
This topology also has switched small-signal currents, which means that the on-
resistance problems encountered in the previous topologies will also be a concern
here. The problem of the feedback amplifier gain error is eliminated since nodes are
simply reset to VDD, but the first stage differential amplifier offset is also a concern.
Susceptibility to Skew
While it initially appears that the choice of a bang-bang phase detector would allow
enough circuit simplification to minimize potential sources of skew, there are still
several issues to consider. The differential amplifier may have process, mismatch,
and temperature dependent input offset leading to a different definition of when the
73
capacitor voltages are equal. The delay in the inverter chains in the pulse-based reset
circuit will be temperature and process dependent and the skew will be a function
of this pulse length. The on-resistance of the switches will vary with process and
temperature. In short, the majority of the main skew sources in the original phase
detector have translated to this circuit.
4.4 Conclusions
Many topologies have been explored, but none provide the precision and insuscepti-
bility to process and temperature that is required of a receiver for high-speed clocking
applications. On-resistance, amplifier gain error, inverter delay, and numerous other
parameters vary so significantly with these process and temperature variations that
it is extremely difficult to find a topology that is not affected by the skew poten-
tially introduced by these sources. At first investigation, many of these topologies
appear quite promising, but the same sources return over and over again to intro-
duce skew. TIA approaches are traditionally criticized for being overly susceptible
to skew and jitter. Perhaps the examination of these topologies simply shows that
it is not the TIA topology, but rather the small-signal to logic-level conversion that
is inherently prone to skew and jitter introduction. In this case, simply finding a
topology that appears simple and elegant at first glance may not solve the problem.
Instead, attention to designing TIA structures or small-signal phase detectors that
contain explicit calibration for mismatch and variation will be required to provide an
acceptable optical-electrical conversion solution.
74
Chapter 5
PLL Loop Dynamics and Complete
Circuit Simulations
The PLL dynamics are of primary importance in designing a functional and reliable
optical clocking system. The analysis of PLL operation requires careful considera-
tion of both large-signal characteristics such as locking range and small-signal loop
stability and damping. These general concerns are relevant to the design of all PLL
topologies, but the specific analysis of the effects is dependent on loop type and order
and therefore varies significantly for PLLs with different phase or phase-frequency
detectors and loop filters. The analysis in this chapter will focus on a PLL using
the original Type II charge steering optoelectronic phase detector and a second-order
loop filter.
5.1 Optical PLL Analysis
The two most critical design criteria for the dynamics of the optoelectronic PLL are
acquisition range and linearized loop stability and damping. Qualitative illustrations
of these two possible failure modes, along with an illustration of idealized proper
operation, are shown in Figure 5-1. Normal operation of a well-damped loop within
the acquisition range is shown in the top frame; in this case the PLL frequency
initializes within the acquisition range and the well-damped loop drives the frequency
75
to lock with minimal ringing. The second frame shows an illustration of a severely
underdamped PLL; the loop initializes within the acquisition range but, instead of
locking, simply oscillates around the desired control voltage. The third frame shows
an illustration of a waveform that the PLL might generate if the loop acquisition
range were much smaller than the VCO tuning range and the VCO initialized to a
frequency far from the reference; the PLL produces a sinusoidal waveform at the beat
frequency of the feedback waveform and the reference, which never approaches the
control voltage that would be required to match the frequencies. Obviously, careful
analysis of acquisition range and stability are required to guarantee functionality of
the PLL. A complete and concise tutorial on PLL design and stability is presented in
[8].
For a phase detector PLL, the acquisition range and stability are determined by
the loop filter transfer function, VCO gain, divider ratio, and phase detector gain.
In addition, range and stability are typically affected in opposite ways by changes
to these parameters such that improvements in one will require compromises in the
other. The following analysis sections will first analyze the factors that determine lock
range and stability and then present simulated results of the PLL demonstrating that
the loop is stable and capable of locking for loop filter voltages initialized throughout
the possible range.
5.1.1 Acquisition Range
The acquisition range of the PLL must encompass the entire VCO range or the
control voltage may initialize outside of the acquisition range and cause the loop to
fail to lock. Quantitative calculations of acquisition range are extremely impractical
because, while the loop is acquiring lock, the phase relationship of the two signals
cycles through a wide range and cannot be linearized around the lock point to create a
LTI system model for analysis. In certain specific cases, various simplifications allow
relatively accurate approximate analytical determination of the range [8].
Qualitative understanding of the acquisition process, however, is much more ac-
cessible. Both phase and phase-frequency detectors have nonlinear transfer functions
76
& 1.5
:o 1
o.5000
2
zD .5
0.5)
Well Damped
Over Damped2
1.5
--. . ............ ........
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Normalized Time (s)
Figure 5-1: Well-damped, overdamped, and underdamped loop dynamics.
when considered across the entire range of phase offset. Figure 5-2 compares the
transfer functions of the optoelectrical phase detector and a PFD. While both ex-
hibit nonlinear characteristics, the PFD provides negative feedback through the en-
tire phase range while the simple phase detector reverses sign and provides positive
feedback if the phase is too far from lock.
If the VCO control voltage initializes far outside the acquisition range and the
feedback and reference frequencies are dramatically different, the phase relationship
rapidly cycles through the portions of the phase detector range that alternately push
the loop towards lock and away from lock. This behavior generates a sinusoid on
the VCO control voltage with a frequency determined by the beat frequency of the
feedback waveform and the reference waveform. Signals near lock will generate slow
sinusoids while those far from lock generate fast sinusoids. Furthermore, the loop
filter is driven by a fixed-current charge-pump type structure so the magnitude of
the sinusoid is proportional to period of the error signal. Explained qualitatively, if
the two frequencies are very close together, the phase relationship changes slowly and
77
_ / \: .. : .... .. : ... ~~.. ............ ........... .. ........ .......... ::.. _
... , ~~~~~~~. . .. . . . . . . ._i i i j~~~~~~~~~~~~ i~~~~~~~~~~~
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Under Damped
. . _ . . . . . .
·- -- ·- ·- · _-~ ·- · · ~~·__ -· ·i ... V· - L .· C -· ·- -- ; k.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
i i i ! I i i 'I
Phase Difference
Figure 5-2: Typical characteristics of a phase detector (top) and a phase-frequencydetector (bottom).
the phase detector remains in the positive gain region for a long time and increases
the loop filter voltage significantly before transitioning to the negative region, and
vice versa. If, however, the frequencies are far apart, the phase relationship changes
rapidly and the phase detector will produce a few UP pulses followed by a few DOWN
pulses and the resulting error signal will have small magnitude.
Assuming that the phase detector topology, divider ratio, and VCO gain and
range are fixed, the acquisition range is determined by the loop filter and the PC
ratio of the photodiode. If this ratio is too large, additional parallel capacitance may
be added to C2 to adjust the filter appropriately. However, it is not possible to add
components to appropriately retune the filter if the -P ratio is too small. The impulse
response of this filter is shown in Equation 5.1. It is obvious, by inspection, that the
impulse must initially charge C2 and then redistribute to balance the voltage on C1
and C2 with a time constant determined by all three passive components. The loop
filter voltage consists of fast voltage ripple due to the charging transients and a slower
acquisition waveform based on the integral portion of the impulse response. Increasing
C2 decreases the cycle-to-cycle voltage ripple associated with the equal charging and
discharging behavior of the phase detector, but also reduces the magnitude of the
78
Phrnase-Current uain
Phase Difference
_· ._·
semi-sinusoidal locking transient thereby decreasing the acquisition range. Increasing
C1 also decreases the acquisition range but without reducing ripple, as the initial
charge must all be integrated on C2. Increasing R damps the loop and improves
stability at the expense of added voltage ripple. However, some amount of voltage
ripple and reduction in range must be tolerated to obtain acceptable damping of the
overall loop dynamics.
The poor sensitivity of silicon integrated photodiodes makes it difficult to achieve
good performance. Diodes fabricated and tested in the same 0.18 Hm process
produced 26 mA when illuminated with mW of optical power [13]; the analysis in
Chapter 4 shows that to obtain 10 MA with similar illumination power and acceptable
transit time would require a diode with nearly 500 fF of parasitic junction capaci-
tance. Figure 5-4 shows that, even with C1 minimized, this parasitic capacitance
of C2 is enough to limit the acquisition range to the point where the loop will not
lock. Fortunately, diode performance is expected to improve in the future and higher
power, bench-based, laser sources may be used to obtain higher - in the context of
research where cost and manufacturability is not a primary concern. It is reasonable
to estimate that, with such a source, the required 10 HA could be obtained from
a photodiode with 200 fF of junction capacitance. All subsequent simulations and
analysis will therefore assume a diode with these characteristics.
h(t) = (C) ( ) ( )t 1 (5.1)+ C2 C1 + as
The simplified non-locking case of a near-sinusoidal waveform on the loop filter
will only occur if the reference and feedback frequencies are so far from lock that
the loop gain is insufficient to exert any noticeable locking force on the loop. For a
realistically designed loop, the more common scenario is cycle slipping. In this case,
the initial feedback frequency is still too far from the reference for the loop to acquire
lock without transitioning into the positive feedback portion of the phase detector
range. However, the two frequencies are close enough that the VCO control voltage
waveform has lower frequency and higher magnitude. This response brings the control
79
Figure 5-3: Loop filter topology.
Control Voltage for Out-of-Range Signal1
o.8 . .... .............. ..... . ........ . .. ......... ... .. . ...... . . ..... .... ...
0 .7 . ........... ................ ...................... . ....................... ...................
0.6 0.3 .....
: 0.5.
. . . ...... .... .
0. 0 0.2 0.4 0.6 0.8 1 1.2
Time (us)
Figure 5-4: Photodiode capacitance of 500 fF limits lock range.
80
rem '1 r0'%
R
% I %.,A C.
Cycle Slipping and Locking
Time (us)
Figure 5-5: Cycleslipping: Simulation of the PLL with the diodes modeled as currentsources with 200 fF parallel capacitance and a 100 fF/20 kQ loop filter.
voltage close enough to the edge of the capture range that the loop gain increases
and exerts a net effect in the correct direction on the loop filter before the sign of the
feedback reverses. Therefore, although the loop does not acquire lock immediately,
each cycle slip will bring the VCO control voltage incrementally closer to the voltage
required to match the reference and the loop will eventually stop cycle slipping and
lock to the reference.
Figure 5-5 shows an example of cycle-slipping in a version of the extracted PLL
simulated with a 200 fF parasitic capacitance and a 100 fF/20 KQ loop filter. Reduc-
ing the photodiode parasitic capacitance to 200 fF sufficiently increases the acquisition
range so that the loop acquires from lock from its natural startup voltage. This figure
also intentionally illustrates the effects of poor loop filter design. Although the loop
is at least stable in this case, unlike the illustration in Figure 5-1, it is extremely
underdamped and the oscillations around the operating point decay slowly. Careful
analysis and design of the loop filter can be used to obtain significantly improved
81
alV .... ''vv -,:V . ..
........................ ! . .................................. ....
.
. ..... . .. ....... :.. ......
I
2
settling dynamics.
5.1.2 Small-Signal Stability Analysis
A formal analysis of stability and damping provides both qualitative and quantitative
understanding of the PLL dynamics. The first step is the linearization of the phase
detector. Within the range of ±fr, the detector is linear and has a transfer function
given by 5.2. The VCO voltage-phase transfer function and the divider phase transfer
function are easily obtained by inspection and are shown for completeness in 5.3 and
5.4, respectively.
IH(s) = - (5.2)
7'r
H(s)= KvCo (5.3)
1H(s)(5.4)
The transfer function of the loop filter is also obtained by basic Laplace methods.
For simplicity of expression, we will define the variables b, T2, and K in 5.5, 5.6, and
5.7. This analysis is similar to the results presented in [2], though the presence of a
feedback divider introduces some differences.
b =1I + C' (5.5)C2
2 = RC1 (5.6)
K= (I) Kvcor2 (5.7)
It is then straight forward to derive the loop filter transfer function of 5.8 in terms
of these defined variables. This filter has one zero and two poles, so the application of
82
Black's Formula to derive the closed-loop transfer function results in the third-order
system of 5.9.
Hs= ( b ) (sC( + 1 (5.8)
H(S) b 712(5-9)s3r2 +K b- i (K) (b- 5)
The Routh-Hurwitz stability criteria may be used to demonstrate the guaranteed
stability of the system. This criteria states that a third-order system with the charac-
teristic equation s + as2 + bs + c will be stable, though not necessarily well-damped,
on the condition that a, b, c > 0 and ab > c. The characteristic equation of the closed-
loop PLL, obtained by multiplying the numerator and denominator by , is given in
5.10.
s3 +2b ) (K)(b-l) (K)(b - 1) (5.10)T2) (N ( T2 ) N (T2)2 (5.10)
If C1 and C2 are both nonzero and finite, then b > 1 and the necessary condition
that b -1 > 0 is satisfied. N must always be greater than zero as must 2 since
resistors and capacitors must have positive values. Finally, K will be greater than
zero if the phase detector gain and VCO gain have equivalent sign. In this case, the
VCO gain is negative and the phase detector range contains metastable points with
both positive and negative gain. The loop will lock to the point within the negative
gain range and K will therefore be positive. Since the product of any number of
positive numbers is clearly positive, this set of criteria is sufficient to determine that
all the coefficients of the characteristic equation are positive. By inspection, since
ab = bc, the condition ab > c will be automatically satisfied since b > 1. This proves
that the loop must always be stable.
The loop must be well-damped as well as stable. Because this is a third-order sys-
tem, the closed-form analytical methods for obtaining critical damping in a second-
order system cannot be applied. Furthermore, the loop dynamics cannot be con-
83
sidered and optimized without regard to the acquisition range, since increasing the
damping of the complex poles results in decreased range and simply designing the
filter for optimal damping might result in unacceptably narrow acquisition range.
Nevertheless, a loop filter design may be obtained by keeping both stability and ac-
quisition range in mind, using Matlab to perform root locus analysis of the system,
and verifying acquisition range and stability with circuit-level simulations. Using
these methods, and assuming a parasitic capacitance of 200 fF, the values of 800 fF
and 44 k were chosen for the loop filter components.
5.2 Final Simulated Results
The final PLL layout was completed, including the chosen loop filter values, and is
shown in Figure 5-6. The photodiodes were modeled by ideal current sources and
200 fF capacitors. The entire PLL layout was extracted with parasitic capacitance
included and simulated. Figure 5-7 shows that this choice of loop filter does produce
a PLL with well-damped dynamics.
The loop filter also provides sufficient acquisition range. In Figure 5-7, the initial
conditions of the loop were determined by the DC bias point analysis of the simulator.
This is likely to be the starting point experienced by the real loop, but since any state
is possible at startup it is necessary to guarantee that the loop can lock from any
possible initial loop filter voltage. A source follower shifts the VCO control voltage
about 0.5 V below the loop filter voltage, which is already slightly limited in voltage
range by the feedback amplifier. Therefore, it is reasonable to assume that control
voltage on the VCO itself will initialize somewhere between 0.1 V and 1.0 V. Figure 5-
8 shows that the loop is able to acquire lock for initial voltages at both extremes of
this range.
If area were a primary consideration, the LC VCO could be replaced with a ring
oscillator structure. The structure should be composed of low-jitter, self-biased ele-
ments such as those proposed in [10] and the tuning range of the resulting VCO should
be carefully analyzed. Ring oscillator VCOs typically have wide tuning ranges, but it
84
mI P -5 .'i A -4' --'4Z I >7'. - I
Figure 5-6: Complete layout of the PLL.
85
- . -0 4 1I;;, : ~ - - - .. - -- - . -
35= ~rmrt~nm~hmmi~` --
M=~1L
7777\�
0.9
0.8
0.7
0.6
> 0.5eP,°0
0.3
0.2
0.1
n
Cycle Slipping and Locking - Well Damped
0 0.2 0.4 0.5 0.5 1 1.2Time (us)
Figure 5-7: Well-damped locking: Simulation of the PLL with the diodes modeled ascurrent sources with 200 fF parallel capacitance and a 800 fF/44 kQ loop filter.
is possible to limit the range by employing current-starved oscillator elements. If the
tuning range of the selected oscillator were too large, the oscillator could potentially
initialize outside of the acquisition range and the loop would never acquire lock. The
identical concern was addressed in this work for the PLL with an LC VCO, but the
magnitude of the potential problem is larger for a ring oscillator VCO with larger
tuning range. The small-signal stability analysis of a PLL with a different oscillator
would remain unchanged with the exception of the VCO gain constant.
Finally, Figure 5-9 shows the 200 MHz reference and feedback waveforms and the
1.6 GHz output clock waveform generated by the PLL. Due to the nonidealities of
the phase detector, discussed in Chapter 4, the 200 MHz waveforms are not locked in
perfect quadrature. However, the phase difference between the two has settled to a
constant value and they are therefore locked, albeit to a phase with some phase offset
from quadrature.
While there is still good reason to choose a PFD over a phase detector given
86
. . .. . . . . . . . . . . . . ... . . . . . ... . . . . . . . . .
............. . . . . . . . . . .. .... . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. .. . . . . . . . . . . . .. ..
....................... ................ ................ ..............
t ~ ~~~~~~~~~~~~~~ -- ----
, .... ........ ... ..... ......
l _ , ... ... ..... .. ......... ....... ... ... . . ... .. ..... ... ... ..... ......... .. .. ........................... -
... . .. . ........ ......... ... ......... . . . . . . . . . . .... .. .. . ....... ...: : : : :,~~llllm~nllllllnnnnn: :- : :
l-~~''~~'' ~ ~ ''~~~'' ~ ~ '''':'' ~ ~ '''' ''
1
.. . . . . . . . . .:. . . . . . . . . . . . . . : . . . . . . . . . . . . . . . .... . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .
-1 1 . . . . . . . . . . . .. . . . . . .: .. . . . . . . . . . . . . . . .. . . . . . . . . . . . -
-- -- --
PLL Locking Transient
Figure 5-8: PLL locking from both extremes of input voltage range.
C0aM.E00
a
aC
0'0r
aa2a
0C.)01
Time (ns)
Figure 5-9: VCO output clock (dotted), optical reference (dashed, 10 /A amplitudescaled for comparison), and divider output (solid) shown at the end of the lockingtransient of Figure 5-7.
87
2Time (us)
Optical Reference, VCO Output, and Divider Output
the option, these results show that practical range and resolution for this applica-
tion will be achievable with the optical-electrical phase detector when the expected
improvements in photodiode performance are realized.
88
Chapter 6
On-Chip Skew Measurement
Accurate characterization of the skew between two instances of the optical PLL is
required in order to assess the performance of the optical clock distribution system.
At 1.6 GHz, the output clock frequency is too high to simply drive the clocks off chip
through pads and measure them externally. Even if this were possible, the required
output buffers could potentially introduce significant skew between the outputs and
cast doubt on the accuracy of the results. High-speed probing of the outputs would
be required to obtain the accuracy necessary for jitter characterization and could also
be used to measure skew very accurately. However, this type of test setup is quite
complex, sensitive, costly, and far more accurate than necessary to obtain reasonable
skew characterization. It is desirable to implement on-chip test circuitry to determine
skew without probing and with precision on the order of 50 ps. A standard time-to-
digital converter (TDC) is a practical way to obtain these measurements and provides
an appropriate balance between resolution, range, and complexity.
6.1 TDC Concept
A time-to-digital converter (TDC) uses a set of incrementally delayed clock edges
to sample a waveform many times at equally spaced intervals and generate a digital
output describing the waveform. Skew can be measured by sampling the two signals
with two TDCs using the same delayed clocks and observing the relative transition
89
positions. Figure 6-1 shows a block level schematic of the dual TDC designed to
characterize the skew of the PLL output clocks. A state machine, described in detail
later in the chapter, is used to control the converter operation. In the first stage of
operation, the sample waveform (SAM) goes high and the state-change propagates
down the inverter chain clocking each one of the sampling registers (SA) with a pro-
gressively delayed clock and sampling the two input signals, INI and IN2. When the
sample output signal (SAO) goes high indicating that the sampling stage is complete,
the control logic sets the multiplexer control signal (S/L) so that each shift register
(SH) takes its D input from the Q output of the adjacent sample register (SA) and
then generates one clock pulse on the shift register clock (CLK) to load the data.
Finally, the control logic sets the multiplexer control signal (S/L) so that each shift
register (SH) takes its input from the output of the previous shift register, clocks the
shift registers with CLK until all the data in the shift register chain has been shifted
off chip via signals SHO1 and SHO2, and then repeats the entire process from the
beginning.
Skew may be determined by comparing the sampled versions of each of the two
PLL output clocks, since the off-chip waveforms generated by the TDC outputs are
sampled and time-scaled representations of the on-chip 1.6 GHz clock signals. The
time-scaling factor is proportional to the ratio of the sampling rate and the shifting
rate. The sampling rate may be determined by configuring a replica of the TDC
critical path as a ring oscillator, measuring the period, and dividing by the number
of stages to determine the delay between sampling edges. The shifting rate is defined
by the frequency of the shift register clock (CLK).
6.2 Critical Path
The resolution of the TDC is determined by the inverter chain and registers which
comprise the critical path. In this 0.18 pm process, the F01 ring oscillator delay is
about 40 ps in the case where the devices are large compared to any interconnect
parasitic capacitance. Making the inverters in the delayed clock generation chain
90
CLK
SAM
S/L
*-0
00
000
00
000
000
Figure 6-1: Time-to-digital converter.
large enough that the sampling register load is only a small percentage of the next
inverter load approximates the F01 case and therefore provides 40 ps resolution.
This resolution figure assumes that the data will be sampled at every available
sampling phase, which requires that the sampling registers must be single phase and
alternately positive and negative edge triggered. The true single-phase clock (TSPC)
register is a common topology that meets these initial criteria [23]. However, if
the positive and negative edge triggered registers have substantially different setup
and hold times, then the accuracy is reduced and a single transition in the data
might even result in a "1111010000" or "0000101111" style "bubble" in the output
data. This type of glitch occurs when an earlier register has a shorter setup time
and captures a transition while the next register has a much longer setup time and
either completely misses the transition and stores the old value or enters a metastable
state. Standard TSPC registers exhibit this problem because opposite polarities of
registers have different setup times since data must propagate through either one or
two logic stages before the clock edge. The more symmetrical split-output TSPC
style, proposed in the same paper and shown in Figure 6-2, provides better matched
setup and hold times and is the best choice for the TDC sampling register topology.
The metastability problem is not entirely eliminated because this type of system
91
Figure 6-2: Split-output TSCP latches.
provides no way to guarantee that the setup and hold times will not be violated by
the delayed clocks, but this latch style reduces the problem and provides the best
available compromise between the various design criteria.
The total time period sampled by the critical path is the product of the single-
stage delay and the total number of stages. The TDC should be designed to sample a
minimum of two periods of the clocks to show two consecutive positive and negative
edges in context. However, the TDC critical path layout is modular and compact
so there is no significant area or design time penalty for extending the time range
to show the generated output clocks over a full period of the input reference and
provide a means for characterizing the effect of the steady-state ripple on the control
voltage. For this implementation, a 128 stage TDC was designed and provides 5 ns
of sampling range in order to sample about 8 periods of the generated clock at the
nominal 625 ps clock period.
6.3 Control and State Machine
A microcoded state machine (MSM) is the best way to run the TDC and guarantee the
appropriate sequential manipulation of the TDC critical path control. The relatively
92
simple state machine table is shown in Table 6.1.
The state machine controls the shift register clock (CLK), the shift/load signal
for the multiplexers (S/L), and the sample pulse for the TDC critical path (SAM).
An additional counter, external to the state machine, is used to count the shift pulses
and is reset at the appropriate time by the S/L signal, which happens to be asserted
at the correct time to be reused for this purpose.
A complete cycle of the TDC begins when the MSM enters state "Sample" and
issues a pulse to the inverter delay line. At this time, the multiplexers are configured
to load, in preparation for the next state. The MSM advances to the "Loadl" state
when the "SAMPLEDONE" signal, generated from the XOR of the delay line input
and output, indicates that the signal has propagated through the entire delay line.
In "Loadl" and "Load2", the MSM generates one complete pulse on the output
"CLK" in order to load the data into the shift registers. Because S/L is asserted
and also serves as the external counter reset enable signal, this pulse also resets the
external counter to the zero state. The next state, "Switchl", is a dummy state
to switch S/L from 1 to 0. The TDC will spend the vast majority of time in states
"Shiftl" and "Shift2", cycling between the two states until the "SHIFTDONE" signal
generated by the external counter indicates that all 128 bits have been shifted off the
chip. The MSM then progresses through the final two dummy states and restarts
the sample/shift cycle. Because this control is integrated, the test setup need only
observe the two shifted outputs on an oscilloscope to determine the skew of the two
PLL output clocks.
6.4 Implementation
The TDC critical path, including the delay line, sampling registers, and shift registers
were custom designed through the layout stage in order to guarantee the required
performance in timing, symmetry, matching, and speed. The 8-by-9 ROM was too
small to be generated with the commercially available ROM tools and was therefore
laid out by hand. A multiplexer implementation would also have been possible, but
93
State | (Name) I CLK S/L SAM CNS[2:0] Input Select[1:0] Polarity
000 Sample 0 1 1 Sample SAMPLEDONE 0
001 Loadl 1 1 0 X AlwaysO 1
010 Load2 0 1 0 X AlwaysO 1011 Switchl 0 0 0 X AlwaysO 1100 Shiftl 1 0 0 X AlwaysO 1101 Shift2 0 0 0 Shift1 SHIFTDONE 0
110 Switch2 0 1 0 Sample AlwaysO 0
111 Dummy 0 1 0 Sample AlwaysO 0
Table 6.1: State table for the microcoded state machine.
the ROM structure was chosen for simplicity and ease of integration. The control logic
is clocked at only 100 MHz and the performance and timing demands are therefore
relaxed, allowing Verilog design and synthesis of the control logic, MSM, and external
counter. A fully functional behavioral Verilog model was written and simulated,
compiled to RTL, and finally synthesized using Silicon Ensemble.
Even using Nanosim instead of SPICE, the simulation and verification of the
complete TDC is challenging because it requires both a long total simulation time
(many microseconds) as well as a small simulation step size (around 10-20 ps) in
order to capture the TDC critical path accurately. In order to solve this problem,
the TDC was simulated in two parts. First, a miniature TDC with a very short
critical path was simulated with high resolution to test the critical path. Second, the
complete TDC was simulated at lower resolution to test the overall logical operation
while loosing some of the accuracy in the critical path simulation. The combination
of these two simulations fully verified the functionality of the TDC.
6.5 Additional Qualitative Verification
In order to obtain a qualitative measurement of the settling dynamics of the PLL,
it may be useful to include other measurement circuits in addition to the TDC. For
example, placing a unity gain buffer at the VCO control voltage, low-pass filtering
the output, and sending it off chip would allow the observation of the low-frequency
94
component of the locking transient and provide useful information about the damping
and acquisition time. Observing the low-pass filtered output of the XOR of the clocks
generated by two PLL instances would allow qualitative determination of whether
the two loops had arrived at steady-state relative to each other. Finally, though
some timing information would be lost, synchronously dividing the outputs by 4
or 8 and buffering them out to a pad would provide a way to view a real-time,
frequency-proportional representation of the circuit operation. While the TDC is
clearly still required, these additional measurement methods would provide valuable
extra information to inform the test process.
6.6 Conclusions
On-chip measurement provides a method for determining the skew between two in-
stances of the PLL to within 40 ps. While there are more accurate methods, such as
arbiter-array skew/jitter measurement [24] and optical skew measurement, the TDC
provides the appropriate combination of range, resolution, and complexity for this
application. The additional test methods provide a simple way to obtain a complete
qualitative and intuitive view of the circuit operation to supplement the quantitative
results provided by the TDC.
95
96
I_�_____�_�
Chapter 7
Conclusions
This thesis presents a complete design and simulation of an optical PLL clock dis-
tribution system using a current-steering optical-electrical phase detector. The con-
tribution of the work, however, also includes insights into the present and future
advantages and challenges of optical clock distribution.
7.1 Summary
An optical-electrical PLL for clock distribution was designed through the layout stage,
extracted, and simulated. The optical current-steering phase detector proposed and
implemented provides direct phase comparison by using the PLL feedback clocks to
steer the photocurrent in order to deliver a current-mode error signal to the loop filter
and drive the PLL towards lock. This phase detector and PLL take the place of a
traditional transimpedance amplifier optical receiver and thereby eliminate a circuit
block that is known to introduce unacceptable levels of skew and jitter.
The phase detector detects the phase difference between the local electrical clock
and the global optical reference by using the state of the divided electrical feedback
clock to determine whether to add or subtract the optical input current from the loop
filter. The resulting change in voltage on the loop filter provides negative feedback to
the VCO and forces the signals to synchronize. This charge steering method provides
simple phase detection, not phase-frequency detection, so the PLL acquisition range
97
and stability is of critical importance. Complete analysis and simulation at the circuit
level show that the final PLL design has sufficient acquisition range to acquire lock
from any possible initialization voltage and that the loop has well-damped dynamics.
Various topologies from the literature were studied in order to find the most
suitable circuits for use in each of the required PLL subcomponents. A standard LC
VCO topology [11] was chosen to generate the 1.6 GHz local clocks, both because
the clocks must have low jitter and because the PLL uses a simple phase detector
and must therefore use a VCO with relatively low tuning range. In order to minimize
skew and jitter, the feedback divider should be differential and should ideally be
implemented in a single synchronous stage. However, even high-speed SCL divider
stages optimized for division speed by embedding the required synchronous divide
logic into the first latch could not provide full-swing outputs at 2 GHz as required
unless resistive loading were used in place of cross-coupled PMOS loads. The risks
of this type of topology outweigh the advantages for this application and the divider
was therefore implemented as a divide-by-two prescaler followed by a synchronous
divide-by-four circuit.
A time-to-digital converter (TDC) was implemented to provide on-chip skew char-
acterization capability. The critical path was designed at the circuit level and imple-
mented with custom layout, while the control state machine was written in Verilog
and synthesized. This TDC provides 40 ps resolution over a 5 ns sample window.
7.2 Simulation Results
The majority of interesting simulation results and findings pertain to the optical-
electrical phase detector block and the optical-electrical PLL as a whole. Although
the current-steering phase detector appears after initial analysis to provide a high-
accuracy alternative to transimpedance amplifier receivers, further simulations pro-
vide insight into several second-order effects that introduce unacceptable levels of
skew between instances of the PLL. Feedback amplifier gain error and CMOS switch
resistance account for the majority of the phase offsets in the initially proposed phase
98
detector. The nominal common-mode offsets from quadrature lock introduced by
these factors are merely inconvenient, but the temperature and process sensitivity
of these offsets results in the skew exceeding reasonable specifications. While the
feedback amplifier problem is topology-specific and may be mitigated by using a
more complex and accurate amplifier, the switch resistance problem is fundamental.
Any phase detector topology based on the concept of steering photocurrents with an
electrical feedback signal will be limited by the reality that switching photocurrents
requires the use of CMOS switches and that the on-resistance of these switches is
extremely process and temperature dependant.
A simple and elegant small-signal phase detector topology capable of leveraging
the precision of a global optical clock to generate low-skew local electrical clocks may
still be discovered. However, most topologies will be adversely affected by the same
process and temperature induced variations that limit the performance of the tran-
simpedance amplifiers they attempt to replace. While it may still be possible to find a
topology that is somehow immune to these effects, designers should realize that such
a topology may not exist and prepare for that possibility by simultaneously devoting
some effort to implementing explicit process and temperature variation compensation
and cancellation within known optical-electrical conversion circuits.
7.3 Future Work
Despite current limitations of optoelectronic receiver circuitry, optical clocks retain
their inherent potential for high-accuracy global clock distribution. Advances in op-
toelectronics, circuits, and system architectures may converge to make optical clock
distribution on microprocessors feasible in the future.
7.3.1 Optoelectronics
The field of integrated optoelectronic systems is advancing rapidly and a variety
of new and improved integrated components will be available to optoelectronic IC
designers within the next decade. High-speed photodiodes have already been demon-
99
strated in custom silicon processes, research efforts to self-assemble high-speed, non-
silicon photodiodes onto silicon wafers are underway, and waveguide matching has
improved dramatically in the past several years. The photodiode advances are par-
ticularly critical as they will improve the of integrated photodiodes to reasonableC
levels for both current-steering phase detectors and RC limited transimpedance am-
plifier systems.
The progress of integrated optical modulators is particularly interesting due to the
potential application in a current-steering phase detector. An optical modulator is an
electrically controlled component capable of modulating the intensity of an incoming
optical signal. One method of achiving this modulation is to split the incoming optical
power and route it through two parallel optical phase shifters with variable phase
relationships. The signals will add constructively when the phase shifters have equal
delay and cancel when they provide a relative phase shift of 7r. The proposed phase
detector effectively completes this modulation in the electrical domain though resistive
CMOS switches and this process is the source of much of the skew introduced by the
phase detector. Implementing phase detection in the integrated optical domain could
provide a path around the fundamental problem of skew generation by the CMOS
switches due to process and temperature variations. This type of system has been
demonstrated in the discrete domain and recent advances in silicon optical modulator
technology may facilitate an integrated version of this solution in the future [25].
7.3.2 Circuits
The VCO, frequency divider, and TDC blocks were implemented with the best known
circuit topologies given time and complexity constraints, but because the phase de-
tector and the overall system were the primary focus of this work, these supporting
blocks were not fully optimized. A LC VCO was used to minimize jitter and re-
duce VCO tuning range, but the implementation of many inductors on a processor
is undesirable from an area perspective. While more complex to implement, low-
jitter self-biased delay element based oscillators would also provide suitable jitter
performance and their tuning range can be limited by employing current-starving
100
__
techniques. For the 1.6 GHz prototype chip implementation, the divider was imple-
mented as a divide-by-two prescaler followed by a synchronous divide-by-four block.
At frequencies approaching 10 GHz, where optical clocking will become even more at-
tractive and higher divider ratios may be required, the speed of synchronous dividers
will likely fall even farther behind the local clock frequency. Further investigation
of synchronous topologies to meet this challenge and of skew and jitter robust asyn-
chronous divider stages will result in improved performance of future optical PLL
implementations. Finally, the implemented TDC has a resolution of 40 ps. While
this is acceptable for basic characterization of skew in this system, future low-skew
systems will require higher accuracy measurement to fully characterize circuit perfor-
mance. Existing methods with high-resolution, however, also have a short sampling
window [24]. Therefore, new circuits for high-resolution measurement of skew over a
reasonable measurement window should be developed.
7.3.3 Complete System
If the photodiode capacitance and switch resistance problems are resolved by ad-
vances in optoelectronics and a new skew-resistant phase detector is developed, an
optical PLL clock distribution system will have significant advantages over a tran-
simpedance amplifier clock distribution system. The local clocks in the PLL system
are generated from local low-jitter sources, whereas the transimpedance amplifier re-
lies on converting the optical signal to create the local clocks and introduces jitter as
well as skew in this process. Because either of these systems would at least initially be
situated relatively high in the H-tree distribution, the total skew-reduction capability
of a transimpedance amplifier system would be limited by the introduction of skew at
these lower levels. In the PLL system, however, the feedback clock is chosen directly
from the gate level so the system can also compensate for any skew generated in the
process of buffering the VCO clocks.
101
7.4 Conclusion
Although the results obtained in this work were limited by the performance of in-
tegrated optical components, optical clocks nevertheless have significant potential to
deliver high-speed, high-accuracy global timing signals. Over the past decade, opti-
cal signaling schemes have been employed in progressively smaller scale applications.
Techniques that originated for use only in long-haul optical networks are now applied
in optical backplanes for high-performance computers. The continued shrinking and
integration of optical components facilitates feasible solutions to challenges created
by ever increasing bandwidth requirements.
Clock speeds on microprocessors have not yet reached speeds that absolutely man-
date optical clock distribution, but as speeds continue to increase there will inevitably
be a point where electrical clocks can no longer meet the performance challenges and
a radical solution will be required. If optoelectronic components continue to become
more integrated and optical signaling is extended into even smaller systems, the inte-
grated optoelectronics technology may very well advance fast enough to make optical
clock distribution feasible before electrical clock distribution fails.
102
Bibliography
[1] M.J. Kobrinsky, B. Block, J. Zheng, B. Barnett, E. Mohammed, M. Reshotko,
F. Robertson, S. List, I. Young, and K. Cadien. On-chip optical interconnects.
Intel Technology Journal, 8(2):128-142, May 2004.
[2] I.A. Young, J.K. Greason, and K.L. Wong. A PLL clock generator with 5-110
MHz of lock range for microprocessors. IEEE Journal of Solid-State Circuits,
27(11):1599-1607, November 1992.
[3] J.M. Rabaey, A.P. Chandrakasan, and B. Nikoli&. Digital Integrated Circuits: A
Design Perspective. Prentice Hall Electronics and VLSI Series. Pearson Educa-
tion, Upper Saddle River, NJ, second edition, 2003.
[4] S. Tam, S. Rusu, U.N. Desai, R. Kim, J. Zhang, and I. Young. Clock generation
and distribution for the first ia-64 microprocessor. IEEE Journal of Solid-State
Circuits, 35(11):1545-1552, November 2000.
[5] S. Tam, R.D. Limaye, and U.N. Desai. Clock generation and distribution for
the 130-nm Itanium 2 processor with 6-MB on-die L3 cache. IEEE Journal of
Solid-State Circuits, 39(4):636-642, April 2004.
[6] D.A.B. Miller. Rationale and challenges for optical interconnects to electrical
chips. Proceedings of the IEEE, 88(6):728-749, June 2000.
[7] A. Bhatnagar, C. Debaes, R. Chen, N.C. Hellman, G.A. Keeler, D. Agarwal,
H. Thienpoint, and D.A.B. Miller. Receiverless clocking of a CMOS digital
103
circuit using short optical pulses. In The 15th Annual Meeting of the IEE Lasers
and Electro-Optics Society, volume 1, pages 127-128. IEEE, November 2002.
[8] B. Razavi, editor. Monolithic Phase-Locked Loops and Clock Recovery Systems.
IEEE Press, New York, 1996.
[9] B.D. Clymer and J.W. Goodman. Timing uncertainty for receivers in optical
clock distribution for VLSI. Optical Engineering, 27(11):944-954, November
1988.
[10] J.G. Maneatis. Low-jitter process-independent DLL and PLL based on self-
biased techniques. IEEE Journal of Solid-State Circuits, 31(11):1723-1732,
November 1996.
[11] D.D. Wentzloff. Design and layout of LC VCO core. Unpublished, 2003.
[12] A. M. Niknejad. ASITIC: Analysis of spiral inductors and transformers for ICs.
http://rfic.eecs.berkeley.edu/ niknejad/asitic.html, 2004.
[13] Travis L. Simpkins. Active optical clock distribution. Master's thesis, Mas-
sachusetts Institute of Technology, Department of Electrical Engineering and
Computer Science, May 2002.
[14] The MOSIS Service. Wafer electrical test data and SPICE model parameters.
http://www.mosis.com/Technical/Testdata/tsmc-018-prm.html, 2004.
[15] J.D. Schaub, R. Li, S.M. Csutak, and J.C. Campbell. High-speed monolithic
silicon photoreceivers on high resistivity and SOI substrates. IEEE Journal of
Lightwave Technology, 19(2):272-278, February 2001.
[16] S.M. Csutak, J.D. Schaub, W.E. Wu, R. Shimer, and J.C. Campbell. High-speed
monolithically integrated silicon photoreceivers fabricated in 130-nm CMOS
technology. IEEE Journal of Lightwave Technology, 20(9):1724-1729, September
2002.
104
[17] M. Yang, K. Rim, D.L. Rogers, J.D. Schaub, J.J. Welser, D.M. Kuchta, D.C.
Boyd, F. Rodier, P.A. Rabidoux, J.T. Marsh, A.D. Ticknor, Q. Yang, A. Upham,
and S.C. Ramac. A high-speed, high-sensitivity silicon lateral trench photode-
tector. IEEE Electron Device Letters, 23(7):395-397, July 2002.
[18] M. Yang, K. Rim, D. Rogers, J. Schaub, J. Welser, D. Kuchta, and D. Boyd.
A CMOS-compatible high-speed silicon lateral trench photodetector. In Device
Research Conference, pages 153-154. IEEE, June 2001.
[19] Q. Ouyang and J.D. Schaub. High speed lateral trench detectors with a junction
substrate. In Device Research Conference, pages 73-74. IEEE, June 2003.
[20] D. Ahn, J. Michel, K. Wada, and L.C. Kimerling. Waveguides and
integrated photodetectors for on-chip optical clock signal distribution.
http://photonics.mit.edu/research/2003/opt clock.html, 2003.
[21] R. Sarpeshkar. Adaptive photoreceptor: 6.376 lecture. MIT 6.376 Lecture Notes,
October 2003.
[22] S. Sidiropoulos and M. Horowitz. Current integrating receivers for high speed
system interconnects. In Proceedings of the IEEE Custom Integrated Circuits
Conference, pages 107-110. IEEE, May 1995.
[23] J. Yuan and C. Svensson. High-speed CMOS circuit technique. IEEE Journal
of Solid-State Circuits, 24(1):62-70, February 1989.
[24] V. Gutnik and A.P. Chandrakasan. On-chip picosecond time measurement. In
Symposium on VLSI Circuits Digest of Technical Papers, pages 52-53. IEEE,
June 2000.
[25] A. Liu, R. Jones, L. Liao, D. Samara-Rubio, D. Rubin, O. Cohen, R. Nicolaescu,
and M. Paniccia. A high-speed silicon optical modulator based on a metal-oxide-
semiconductor capacitor. Nature, 427(12):615-618, February 2004.
105
top related