embedded tutorial: breaking the dynamic power barrier using
Post on 11-Feb-2022
2 Views
Preview:
TRANSCRIPT
Embedded Tutorial: Breaking the Dynamic Power Barrier using
Distributed-LC Resonant Clocking
Matthew R. Guthaus
Department of Computer Engineering
University of California Santa Cruz
Santa Cruz, CA 95064
mrg@ucsc.edu
Abstract—Power consumption is the predominant challenge in modern
high-performance systems and is becoming increasingly important due to
mobile applications. High power consumption leads to decreased battery
lifetime, cooling challenges which limit form factors, and large on-chip
temperatures which can decrease reliability. While stand-by and idle
power can be minimized with many techniques, active-mode dynamic
power consumption during peak operation continues to be a formidable
barrier. Reducing the clock power consumption can significantly reduce
overall active-mode dynamic power. This embedded tutorial explores the
recent commercial and academic results in the design and optimization
of distributed-LC resonant clocks and discusses the future challenges and
open research problems.
I. INTRODUCTION
Power consumption continues to be one of the major concerns
in VLSI design, especially due to the increasing demand of mobile
applications. Clock gating, power gating, dynamic voltage and fre-
quency scaling (DVFS), and multiple threshold voltages among other
techniques are used to reduce dynamic and leakage power. However,
this power is usually saved by exploiting inactivity or adjusting
performance to minimally meet demand. After all is said and done,
significant dynamic power is still required to perform computation.
This required amount of computation power is the dynamic power
barrier.
Many high-performance applications have active circuits that can-
not be further slowed or turned off without sacrificing performance.
This is especially problematic in highly parallel many-core and
GPGPU designs where high activities during peak throughput demand
the most power and the chips often have unsustainable levels of power
consumption. While previous methods can save power outside of this
peak demand, new methods to lower active-mode dynamic power
consumption are in absolute demand. Resonant clocking is gaining
acceptance as a technique to address this.
On-chip clock distribution networks (CDNs) often consume in
excess of 35% of total chip power and occasionally as much as
70%. Resonant clock distributions can reduce this power by recycling
energy on-chip and reducing the overall clock power beyond tradi-
tional circuit and physical design techniques [1]–[3]. This embedded
tutorial introduces recent techniques for distributed-LC resonant clock
distributions for both circuit designers and CAD engineers. It begins
with the basic distributed-LC resonant circuit techniques and how
these are applied to form resonant clock networks. It then includes an
overview of the recent resonant-clocked chips prototyped by IBM and
AMD. Finally, it discusses the state-of-the-art synthesis algorithms for
resonant clocks in ASIC methodologies. Last, it discusses remaining
challenges in the research of distributed-LC resonant clocks.
II. RESONANT CLOCK CLASSIFICATION
There are three distinctly different approaches to create “resonant”
clocks. These include standing wave (salphasic) [4], [5], traveling
wave (rotary) [6], and LC-tank resonant [7]–[18] clock distributions.
Prior embedded tutorials have compared and contrasted the previous
three types of clock distributions [19], but this tutorial focuses on
the later. Standing and traveling wave clocks assume that on-chip
inductance forms transmission lines whereas the LC-tank resonant
clocks assume lumped inductors. LC-tank resonant clocks can further
be distinguished as monolithic or distributed. Monolithic LC-tank
clock distributions use a single tank circuit whereas distributed LC-
tank clock address the distributed parasitic interconnect in large
modern chips.
III. MONOLITHIC AND DISTRIBUTED LC-TANK RESONANT
CLOCKS
The LC-tank resonant clocks are similar to conventional clocks
by providing a clock signal with constant phase and magnitude,
thus it easier to integrate in a traditional IC flow than standing
and traveling wave clocks. However, the LC-tank resonant clocks
require additional passive components to form electrical resonance
to reduce power consumption. This electrical resonance occurs when
the imaginary reactive parts of impedance cancel out at a particular
“resonant” frequency. Resonance has been used extensively in the
analog and radio frequency (RF) fields to build high-quality passive
filter networks.
Both monolithic and distributed LC-tank resonant clocks modify
the clock impedance by inserting inductor-capacitor (LC) “tank”
circuits to cancel the aforementioned reactive portion of the clock
impedance. In a parallel LC circuit, this ideally leaves an infinite
electrical impedance at the resonant frequency, while in a series
LC-tank this ideally leaves zero electrical impedance. Increasing
the clock network impedance allows clock driver transistor sizes to
be dramatically reduced which saves both power and silicon area.
In addition, the oscillatory behavior of an LC-tank circuit recovers
some of the energy as it is transferred back and forth between
the clock/decoupling capacitance and the inductor. LC-tank resonant
clocks have been shown to save between 30% and 80% of dynamic
power compared to buffered clocks [8], [16], [23].
Low-power, monolithic voltage controlled oscillators (VCOs)
have used similar techniques for many years to create on-chip clock
references [20], [21]. Early adoption of these techniques resulted in
monolithic LC tank clocks [7], [10], [11]. Recently, monolithic LC
tanks have been simulated using additional harmonics to improve the
sinusoidal slew rates [22]. The monolithic concepts were extended to
978-1-4799-0524-9/13/$31.00 ©2013 IEEE380
distributed LC-tank clocks in order to address performance scalability
and harness improved power efficiency. Industry has demonstrated
distributed-LC resonant clocks for uniform H-trees [8], [9] and clock
grids [18], [23].
On the design tools side, several methodologies have been
proposed to synthesize global H-tree resonant clocks [12], asym-
metric global resonant clock trees [13], hierarchical local resonant
clocks [17], and resonant clock grids using custom inductor siz-
ing [14], [16] and inductor-library-based sizing [15]. At least one
company has a commercial product to support LC-tank resonant
clocking [24] in commercial designs [18], [23].
IV. CHALLENGES AND OPPORTUNITIES
Numerous challenges remain as barriers to the acceptance of
resonant clocking in most designs. The following are some of the
challenges of resonant clocking:
• increased resource consumption by inductors and decoupling
capacitances,
• integration with existing clock gating techniques,
• difficulty in wide-range frequency scaling for DVFS,
• increased clock slew,
• difficulty generating mid-frequency (1−2GHz) clocks,
• and integration into existing design tools and methodologies.
However, solutions to the above problems are becoming apparent
in recent years. This tutorial discusses each of the prior issues and
presents interim solutions to many of them.
In particular, the skepticism of designers to accept a multi-
disciplinary methodology that requires knowledge of analog/RF
circuits, digital design, and computer-aided design is problematic.
Because of this, research into design tools and methodologies is
the biggest challenge in the full-scale adoption of these technologies
in mainstream ICs. The synthesis of a resonant clock distribution
network to synchronize an entire system, complete with a global
and local clock networks, is a gargantuan task, but is of the utmost
importance for future low-power needs and the adoption of such a
promising technique.
ACKNOWLEDGMENTS
This work was supported in part by the National Science Foun-
dation under grant CCF-1053838.
REFERENCES
[1] M. R. Guthaus, G. Wilke, and R. Reis, “Non-uniform clock meshoptimization with linear programming buffer insertion,” in ACM/IEEE
Design Automation Conference (DAC), June 2010, pp. 74–79.
[2] M. Guthaus, X. Hu, G. Wilke, G. Flache, and R. Reis, “High-performance clock mesh optimization,” ACM Transactions on Design
Automation of Electronic Systems (TODAES), vol. 17, no. 3, pp. 33:1–33:17, June 2012.
[3] M. Guthaus, G. Wilke, and R. Reis, “Revisiting automated physicalsynthesis of high-performance clock networks,” ACM Transactions on
Design Automation of Electronic Systems (TODAES), vol. 18, no. 2, pp.31:1–31:27, March 2013.
[4] F. O’Mahony, C. Yue, M. Horowitz, and S. Wong, “Design of a 10GHzclock distribution network using coupled standing-wave oscillators,” inACM/IEEE Design Automation Conference (DAC), June 2003, pp. 682–687.
[5] V. Chi, “Salphasic distribution of clock signals for synchronous sys-tems,” IEEE Transactions on Computers, vol. 43, no. 5, pp. 597 – 602,May 1994.
[6] J. Wood, T. C. Edwards, and S. Lipa, “Rotary traveling-wave oscil-lator arrays: A new clock technology,” IEEE Journal of Solid-State
Circuits (JSSC), vol. 36, no. 11, pp. 1654–1664, November 2001.
[7] A. Drake, K. Nowka, T. Nguyen, J. Burns, and R. Brown, “Resonantclocking using distributed parasitic capacitance,” IEEE Journal of Solid-
State Circuits (JSSC), vol. 39, no. 9, pp. 1520 – 1528, September 2004.
[8] S. Chan, P. Restle, K. Shepard, N. James, and R. Franch, “A 4.6GHzresonant global clock distribution network,” in IEEE International
Solid-State Circuits Conference (ISSCC), February 2004, pp. 342 – 343.
[9] S. C. Chan, K. L. Shepard, and P. J. Restle, “Design of resonantglobal clock distributions,” in International Conference on Computer
Design (ICCD), October 2003, pp. 248–253.
[10] C. Ziesler, S. Kim, and M. Papaefthymiou, “A resonant clock generatorfor single-phase adiabatic systems,” in IEEE International Symposium
on Low Power Electronics and Design (ISLPED), August 2001, pp.159–164.
[11] J.-Y. Chueh, M. Papaefthymiou, and C. Ziesler, “Two-phase resonantclock distribution,” in IEEE International Symposium on Very Large
Scale Integration (ISVLSI), May 2005, pp. 65–70.
[12] J. Rosenfeld and E. Friedman, “Design methodology for global resonanth-tree clock distribution networks,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 15, no. 2, pp. 135–148, February2007.
[13] M. R. Guthaus, “Distributed LC resonant clock tree synthesis,” in IEEE
International Symposium on Circuits and Systems (ISCAS), May 2011,pp. 1215–1218.
[14] X. Hu and M. R. Guthaus, “Distributed resonant clock grid synthesis(ROCKS),” in ACM/IEEE Design Automation Conference (DAC), June2011, pp. 516–521.
[15] X. Hu, W. Condley, and M. R. Guthaus, “Library-aware resonantclock synthesis (LARCS),” in ACM/IEEE Design Automation Confer-
ence (DAC), June 2012, pp. 145–150.
[16] X. Hu and M. Guthaus, “Distributed LC resonant clock grid synthesis,”IEEE Transactions on Circuits and Systems I (TCAS-I), 2012.
[17] W. Condley, X. Hu, and M. Guthaus, “A methodology for local resonantclock synthesis using LC-assisted local clock buffers,” in International
Conference on Computer-Aided Design (ICCAD), November 2011, pp.503–506.
[18] V. Sathe, S. Arekapudi, C. Ouyang, M. Papaefthymiou, A. Ishii, andS. Naffziger, “Resonant clock design for a power-efficient high-volumex86-64 microprocessors,” in IEEE International Solid-State Circuits
Conference (ISSCC), February 2012, pp. 68–70.
[19] M. Guthaus and B. Taskin, “High-performance, low-power resonantclocking: Embedded tutorial,” in IEEE/ACM International Conference
on Computer-Aided Design (ICCAD), 2012, pp. 742–745.
[20] R. M. Senger, E. D. Marsman, M. S. McCorquodale, F. H. Gebara,K. L. Kraver, M. R. Guthaus, and R. B. Brown, “A 16-bit mixed-signal microsystem with integrated CMOS-MEMS clock reference,” inACM/IEEE Design Automation Conference (DAC), June 2003, pp. 520–525.
[21] E. D. Marsman, R. M. Senger, M. S. McCorquodale, M. R. Guthaus,R. A. Ravindran, G. S. Dasinka, S. A. Mahlke, and R. B. Brown, “A 16-bit low-power microcontroller with monolithic MEMS-LC clocking,” inIEEE International Symposium on Circuits and Systems (ISCAS), 2005,pp. 624–627.
[22] H. Skinner, X. Hu, and M. R. Guthaus, “Harmonic resonant clocking,”in IFIP/IEEE International Conference on Very Large Scale Integra-
tion (VLSI-SoC), 2012, pp. 59–64.
[23] V. Sathe, S. Arekapudi, A. Ishii, C. Ouyang, M. Papaefthymiou, andS. Naffziger, “Resonant-clock design for a power-efficient, high-volumex86-64 microprocessor,” IEEE Journal of Solid-State Circuits (JSSC),vol. 48, no. 1, pp. 140–149, 2013.
[24] http://www.cyclos-semi.com.
381
top related