design methodologies and tools for vertically …

DESIGN METHODOLOGIES ANDTOOLS FOR VERTICALLYINTEGRATED CIRCUITS

A thesis submitted to the University of Manchesterfor the degree of Doctor of Philosophy

in the Faculty of Science and Engineering

2017

ByCharalampos (Harry) Kalargaris

School of Computer Science

Contents

Abstract 14

Declaration 15

Copyright 16

Acknowledgements 17

1 Introduction 191.1 Challenges in Modern Integrated Circuits . . . . . . . . . . . . . . 191.2 Vertically Integrated Circuits . . . . . . . . . . . . . . . . . . . . 211.3 Contributions and Publications . . . . . . . . . . . . . . . . . . . 221.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Emerging Technologies for Integrated Circuits 272.1 2.5-Dimensional Integration . . . . . . . . . . . . . . . . . . . . . 27

2.1.1 Interposer Technologies . . . . . . . . . . . . . . . . . . . . 282.1.2 Silicon vs Glass Interposers . . . . . . . . . . . . . . . . . 31

2.2 Three-Dimensional Integration . . . . . . . . . . . . . . . . . . . . 332.2.1 Types of 3-D Integration . . . . . . . . . . . . . . . . . . . 342.2.2 TSV Characteristics and Bonding Styles . . . . . . . . . . 382.2.3 Benefits of TSV Based 3-D ICs . . . . . . . . . . . . . . . 402.2.4 Challenges in 3-D ICs . . . . . . . . . . . . . . . . . . . . 44

2.3 Low Power Techniques . . . . . . . . . . . . . . . . . . . . . . . . 472.3.1 Power Consumption of Integrated Circuits . . . . . . . . . 472.3.2 Power Supply Voltage Scaling . . . . . . . . . . . . . . . . 49

2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3

3 Interconnect Design and Analysis for Interposer Technologies 533.1 Modelling Wires on Silicon and Glass Interposers . . . . . . . . . 543.2 Interconnect Analysis on Interposers . . . . . . . . . . . . . . . . 563.3 Design Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4 Design Flow for TSV Based 3-D Circuits 694.1 EDA Tools and Flows for Designing Circuits . . . . . . . . . . . . 704.2 STA-Compatible Backend Design Flow . . . . . . . . . . . . . . . 734.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.1 Physical Characteristics of 2-D and 3-D Designs . . . . . . 794.3.2 Timing Analysis of 3-D Circuits . . . . . . . . . . . . . . . 814.3.3 Power Analysis of 3-D Circuits . . . . . . . . . . . . . . . 83

4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5 Voltage Scaling in 3-D Circuits 895.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.2 Related Work of Voltage Scaling in 3-D ICs . . . . . . . . . . . . 915.3 Voltage Scaling Opportunities in 3-D ICs . . . . . . . . . . . . . . 935.4 Determine Whether Voltage Reduction is Applicable . . . . . . . 94

5.4.1 Interconnect and Voltage Aware Timing Model . . . . . . . 945.4.2 Timing-Slack Voltage Reduction Methodology for 3-D ICs 96

5.5 Design Flow Extension for Voltage Reduction in 3-D ICs . . . . . 1025.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.6.1 Applicability of Voltage Scaling to 3-D ICs . . . . . . . . . 1045.6.2 Quantifying Power Gains . . . . . . . . . . . . . . . . . . . 108

5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6 An Interface Circuit for Systems with Multiple Voltage Domains1156.1 Additional Circuitry in Systems with Multiple Voltage Domains . 1166.2 By-Pass Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . 1186.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.3.1 Performance Analysis . . . . . . . . . . . . . . . . . . . . . 1226.3.2 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . 1256.3.3 Area Anaylisis . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4

7 Conclusions and Future Work 1317.1 Summary of the thesis . . . . . . . . . . . . . . . . . . . . . . . . 1317.2 Future Research Topics . . . . . . . . . . . . . . . . . . . . . . . . 134

Bibliography 137

Word Count: 32763

5

List of Tables

2.1 Brief overview of 3-D technologies. . . . . . . . . . . . . . . . . . 352.2 Performance and power improvements of 3-D over 2-D logic circuits

[54]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.3 Summary of different approaches for splitting homogenous circuits

in three dimensions based on previous works. . . . . . . . . . . . . 44

3.1 Interconnect physical characteristics for the 65 nm PTM processnode [96]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.2 Electrical characteristics of interconnects on glass and silicon inter-posers for minimum pitch. . . . . . . . . . . . . . . . . . . . . . . 58

3.3 Power consumption ([mW]) of interconnects on glass and siliconinterposers for fixed pitch at 1.95 µm with wirelength of 10 mmand frequency of 200 MHz. . . . . . . . . . . . . . . . . . . . . . . 63

3.4 Interconnect delay ([ns]) on glass and silicon interposers for fixedpitch at 1.95 µm with wirelength of 10 mm and frequency of 200MHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.5 Power-delay product ([pJ]) of wires on glass and silicon interposersfor fixed pitch at 1.95 µm with wirelength of 10 mm and frequencyof 200 MHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.1 A summary of previous works on 3-D EDA tools. . . . . . . . . . 724.2 Physical and electrical characteristics of different TSV technologies. 744.3 Benchmark circuits and the respective number of cells. . . . . . . 794.4 Physical characteristics of benchmark circuits by utilizing a 2-D

and the proposed 3-D flow. . . . . . . . . . . . . . . . . . . . . . . 804.5 Supported clock period by each of the benchmark circuits. . . . . 814.6 Average power consumption of benchmark circuits. . . . . . . . . 854.7 Power for application-specific testbenches. . . . . . . . . . . . . . 86

6

5.1 A summary of previous works on voltage scaling and voltage do-mains for 3-D circuits. . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2 Benchmark circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.3 Area, wirelength, and number of TSVs for the benchmark circuits

designed both in two and three dimensions. . . . . . . . . . . . . . 1055.4 Supported clock period of the benchmark circuits for the same

operating voltage as in the 2-D design. . . . . . . . . . . . . . . . 1065.5 Reduction of the operating voltage for the benchmark circuits due

to the added timing slack produced by the 3-D stacking. . . . . . 1075.6 Simulated scenarios for evaluating power consumption. . . . . . . 1095.7 Breakdown of average power to its components. . . . . . . . . . . 1105.8 Breakdown of total (cycle-accurate) power to its components and

peak power for application-specific testbenches. . . . . . . . . . . 112

6.1 Operation modes of the proposed by-pass circuit. . . . . . . . . . 1206.2 Simulated scenarios for the proposed circuit. . . . . . . . . . . . . 1226.3 Delay [ps] of the proposed by-pass circuit compared to traditional

level-down shifters and isolation cells. . . . . . . . . . . . . . . . . 1246.4 Maximum latency for the investigated paths at high performance

mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.5 Power dissipation [µW ] of the proposed by-pass circuit compared

to different combinations of by-passed cells. . . . . . . . . . . . . 1266.6 Area [µm2] comparison of the proposed by-pass circuit compared

to different combinations of by-passed cells. . . . . . . . . . . . . 1296.7 Overhead in area of additional cells used for the MVS technique

on industrial circuits. Results are normalized to the total area ofthe design without these additional cells. . . . . . . . . . . . . . . 129

7

List of Figures

1.1 Vertical integration technologies. . . . . . . . . . . . . . . . . . . 21

2.1 ITRS prediction of 15× increase in logic-to-I/O ratio by 2020 [16]. 282.2 Schematic of an interposer based integrated system. . . . . . . . . 292.3 X-ray cross-section of interposer and its components [22], [23]. . . 302.4 Silicon interposers versus conventional package-level interconnects,

[25]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.5 Comparison of silicon and glass characteristics used as interposer

substrates in 2.5-D integrated systems [26], [27]. . . . . . . . . . . 322.6 Manhattan wirelength of (a) a 2-D circuit, (b) a two tier 3-D circuit

and (c) a n tier 3-D circuit. . . . . . . . . . . . . . . . . . . . . . 342.7 Monolithic 3-D ICs; (a) complimentary MOS device in different

layers [2] and (b) 3-D fin-FET CMOS inverter [38]. . . . . . . . . 362.8 Contactless 3-D ICs based on (a) capacitive [40] and (b) inductive

coupling [2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.9 3-D integrated system with TSV and µbumps. . . . . . . . . . . . 372.10 Main fabrication steps of different TSV technologies. . . . . . . . 382.11 (a) Footprint and (b) cross-section of a TSV structure. . . . . . . 392.12 Different bonding styles of tiers. . . . . . . . . . . . . . . . . . . . 412.13 Floorplan of x86 microprocessor (a) 2-D and (b) 3-D [56]. . . . . . 422.14 Memory stacking options: (a) 4 MB baseline and (b) 8 MB stacked

for a total of 12 MB [56]. . . . . . . . . . . . . . . . . . . . . . . . 43

3.1 A motivating example of s system with a CPU and a Memory withinterposer traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.2 Electrical model of three wires on interposers, interconnecting adescriptive example of an interposer system with a CPU and amemory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

8

3.3 Cross-section of two interconnect structures on glass and siliconinterposers, respectively. . . . . . . . . . . . . . . . . . . . . . . . 58

3.4 (a) Power, (b) delay, and (c) noise simulations of the intercon-nect structure shown in Fig. 3.2 for minimum interconnect pitch,wirelength of 10 mm, and frequency of 200 MHz. . . . . . . . . . 59

3.5 Far-end voltage noise for interconnects on silicon and glass inter-posers with wirelength of 10 mm and frequency of 200 MHz. . . . 60

3.6 Power-delay product for interconnects on silicon and glass inter-posers for different pitches with wirelength of 10 mm, and frequencyof 200 MHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.7 Power-delay product of interconnects in low density interposerswith increasing width and fixed spacing at 1.50 µm at wirelengthof 10 mm and frequency of 200 MHz. . . . . . . . . . . . . . . . . 62

3.8 Comparison of power reduction by increasing the space or decreas-ing the width of wires on different interposer materials for highinterconnect densities (pitch < 2 µm) with wirelength of 10 mmand frequency of 200 MHz. . . . . . . . . . . . . . . . . . . . . . . 64

3.9 Crosstalk and PDP simulations of interconnects on silicon and glassinterposers for fixed pitch at 1.95 µm with wirelength of 10 mmand frequency of 200 MHz. . . . . . . . . . . . . . . . . . . . . . . 66

4.1 The typical backend of a flow for designing 2-D circuits. Lightgrey rectangles depict the steps of the flow, whereas the primaryintermediate files produced by each of these steps are illustratedby white rectangles. . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.2 Proposed backend of a design flow for TSV based three-dimensionalintegrated circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.3 Merging of net segments from two tiers for a 3-D net. . . . . . . . 774.4 Breakdown of paths delay in in2reg, reg2reg, and reg2out for bench-

mark circuits with 2-D and 3-D integration. . . . . . . . . . . . . 824.5 Timing slack histogram of inter-tier paths for B04 circuit. . . . . . 834.6 Comparison of timing slack histograms of inter-tier paths for the

LDPC circuit in two and three dimensions, respectively. . . . . . . 844.7 Power trace of the LDPC circuit by utilizing a standard 2-D and

the proposed 3-D design flow. . . . . . . . . . . . . . . . . . . . . 87

9

5.1 A typical path composed of gates and interconnects. . . . . . . . . 955.2 Delay reduction due to vertical integration for paths with different

number of gates (N) and the same global and intermediate wiresegments, where wirelength changes with n according to (5.10). . 99

5.3 (a) Delay and (b) sensitivity of logic gates to voltage reductionwhile driving a minimum size inverter. . . . . . . . . . . . . . . . 100

5.4 Change in delay of paths with different path sensitivity wherevoltage is gradually reduced. . . . . . . . . . . . . . . . . . . . . . 101

5.5 Stages of a 3-D design flow where commercial EDA tools withstandard file formats are utilized. At iso-performance operationas compared to the 2-D circuit, the same timing constraints andoperating voltage are used as inputs. . . . . . . . . . . . . . . . . 103

5.6 Methodology to evaluate the voltage reduction in 3-D circuits. . . 1045.7 Total average power consumption at the same speed D2−D =D3−D.1095.8 Power trace of the LDPC circuit for different scenarios. . . . . . . 1115.9 Power savings of application-specific testbenches for the 3-D inves-

tigated circuits (scenarios S2 and S3, see Table 5.6) as comparedto the 2-D implementation of the circuits (scenario S1). . . . . . . 112

6.1 Circuit schematic of a feedback based level-up shifter [141]. . . . . 1176.2 Proposed circuit at the block interface to by-pass level conversion

and/or isolation/retention cells used to support multiple voltagedomains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.3 Delay of the proposed by-pass circuit compared to traditional level-up shifters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6.4 Delay of the proposed circuit as compared when isolation cells areconnected in series with level-up shifters. . . . . . . . . . . . . . . 124

6.5 Power dissipation of the proposed by-pass circuit compared totraditional level-up shifters. . . . . . . . . . . . . . . . . . . . . . 126

10

List of Abbreviations2-D Two-Dimensional

2.5-D Two and half-Dimensional

3-D Three-Dimensional

AVS Adaptive Voltage Scaling

B2B Back-to-Back

BEOL Back-end-of-the-line

CAD Computer Aid Design

CDN Clock Distribution Network

CMPs Chip Multi Processors

D2D Die-to-Die

D2W Die-to-Wafer

DVFS Dynamic Voltage and Frequency Scaling

EDA Electronic Design Automation

ESD Electrostatic Discharge

F2B Face-to-Back

F2F Face-to-Face

FEOL Front-end-of-the-line

FLS Feedback-based Level-up Shifter

FP Floorplan

FU Functions Unit

GPU Graphics Processing Unit

I/Os Input/Output terminals

11

IBIS Input/Output Buffer Information Specification

IC Integrated Circuit

IPC Instructions Per Cycle

ISO Isolation cell

ITRS International Technology Roadmap for Semiconductors

KGD Known Good Die

LEF Layout Exchange Format

LSDN Level down shifter

MTA Manchester Thermal Analyzer

MVD Multiple Voltage Domains

NoC Network-on-Chip

ODT On-Die-Termination

PCB Printed Circuit Board

PDN Power Distribution Network

PMU Power Management Unit

PnR Place and Route

PoP Package-on-Package

PTM Predictive Technology Model

RDL Re-distribution Layer

SI Signal Integrity

SiP System-in-Package

SLC Surface Laminate Circuit

SOI Silicon-On-Insulator

12

SoP System-on-Package

SPEF Standard Parasitics Exchange Format

STA Static Timing Analysis

TG Transmission Gate

TGV Through Glass Vias

TSV Through Silicon Vias

VF Via-First

VL Via-Last

VM Via-middle

W2W Wafer-to-Wafer

WLP Wafer-Level-Packaging

Abstract

Design Methodologies and Tools for VerticallyIntegrated Circuits

Charalampos (Harry) KalargarisA thesis submitted to the University of Manchester

for the degree of Doctor of Philosophy, 2017

Vertical integration technologies, such as three-dimensional integration and in-terposers, are technologies that support high integration densities while offeringshorter interconnect lengths as compared to planar integration and other packagingtechnologies. To exploit these advantages, however, several challenges lay acrossthe designing, manufacturing and testing stages of integrated systems. Consideringthe high complexity of modern microelectronic devices and the diverse featuresof vertical integration technologies, this thesis sheds light on the circuit designprocess. New methodologies and tools are offered in order to assess and improvetraditional objectives in circuit design, such as performance, power, and area forvertically integrated circuits. Interconnects on different interposer materials areinvestigated, demonstrating the several trade-offs between power, performance,area, and crosstalk. A backend design flow is proposed to capture the performanceand power gains from the introduction of the third dimension. Emphasis is alsoplaced on the power consumption of modern circuits due to the immense growthof battery-operated devices in the last fifteen years. Therefore, the effect of scalingthe operating voltage in three-dimensional circuits is investigated as it is one ofthe most efficient techniques for reducing power while considering the performanceof the circuit. Furthermore, a solution to eliminate timing penalties from theusage of voltage scaling technique at finer circuits granularities is also presentedin this thesis.

14

Declaration

No portion of the work referred to in this thesis has beensubmitted in support of an application for another degreeor qualification of this or any other university or otherinstitute of learning.

15

Copyright

i. The author of this thesis (including any appendices and/or schedules tothis thesis) owns certain copyright or related rights in it (the “Copyright”)and s/he has given The University of Manchester certain rights to use suchCopyright, including for administrative purposes.

ii. Copies of this thesis, either in full or in extracts and whether in hardor electronic copy, may be made only in accordance with the Copyright,Designs and Patents Act 1988 (as amended) and regulations issued under itor, where appropriate, in accordance with licensing agreements which theUniversity has from time to time. This page must form part of any suchcopies made.

iii. The ownership of certain Copyright, patents, designs, trade marks and otherintellectual property (the “Intellectual Property”) and any reproductions ofcopyright works in the thesis, for example graphs and tables (“Reproduc-tions”), which may be described in this thesis, may not be owned by theauthor and may be owned by third parties. Such Intellectual Property andReproductions cannot and must not be made available for use without theprior written permission of the owner(s) of the relevant Intellectual Propertyand/or Reproductions.

iv. Further information on the conditions under which disclosure, publicationand commercialisation of this thesis, the Copyright and any IntellectualProperty and/or Reproductions described in it may take place is availablein the University IP Policy (see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487), in any relevant Thesis restriction declarationsdeposited in the University Library, The University Library’s regulations (seehttp://www.manchester.ac.uk/library/aboutus/regulations) and inThe University’s policy on presentation of Theses

16

http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487

http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487

http://www.manchester.ac.uk/library/aboutus/regulations

Acknowledgements

First of all, I would like to express my sincere gratitude to my main supervisorDr. Vasilis F. Pavlidis. His guidance and constant encouragement during myPhD continuously motivated me to make progress. He spent countless hourspatiently discussing and explaining all manner of things to me and for this I amimmensely grateful. His solid knowledge on integrated circuits and on researchpractices helped me evolve as a researcher and prepare myself for a successfulprofessional career. In addition, his fairness and kind personality not only madeour collaboration a pleasant experience but also he became a role model for me,both professionally and in life.

Furthermore, I would like to thank my co-supervisor Dr. Jim Garside whoseoffice door has always been open to me. I would like to express my very greatappreciation to John Goodacre for introducing me to the industrial world andproviding me useful advice during my PhD so to perform cutting edge research. Aspecial thank you to Jeff Pepper for all his technical support throughout my PhD.I would also like to thank the head of the APT group Professor Steve Furber andall the APT members, for creating a nice and productive working environment.

A big thank you to my close colleagues Ioannis Papistas, Thanos Stratikopoulos,Scott Ladenheim, Yi-Chung Chen, and Przemyslaw Mroszczyk for making myeveryday life in the office a pleasant experience.

In addition, I would like to thank my brother Ioannis Kalargaris and my friendsfrom my undergraduate years in Greece, Ios Kotsogiannis and Andreas Andriotis.Due the difficult times in Greece, we all started pursuing our PhD degrees indifferent places around the world. Our everyday chats helped me overcome thestressful times of my PhD as we share common experiences and backgrounds.

Furthermore, I would like to thank my girlfriend Milica Mladenovic for hersupport and patience during the period of my PhD.

17

I would also like to thank the Engineering and Physical Sciences Research Coun-cil (EPSRC), ARM Ltd, and President’s Doctoral Scholar Award for financiallysupporting me during my PhD.

Finally, I would like to thank my parents for being supportive and such greatmentors in my life. This thesis is dedicated to them.

18

Chapter 1

Introduction

1.1 Challenges in Modern Integrated Circuits

Microelectronic systems have become essential parts of our daily activities. Inte-grated circuits (ICs) are omnipresent from devices demanding high performance,such as servers and workstations, to battery operated devices requiring low power,including smart phones, tablets, and other wearable electronics. Semiconductorindustry has been able to keep up with the increasing need for integration byscaling the size of transistors and evolving packaging techniques in order to fitmore ICs into a system. However, economical and especially technical issues areslowing down the scaling effort [1]. In addition, packaging technologies need toimprove to support high density of integration, particularly for mobile systemswhere small form factor is highly important [2].

In traditional planar integration, transistors are placed next to each otherforming a two-dimensional (2-D) circuit. As predicted by Moore’s Law [3], thesemiconductor industry has been able to profitably double the density of 2-Dcircuits by shrinking the physical dimensions of transistors every 18 months, overthe past five decades. The lower capacitance of smaller transistors increases theperformance and reduces the power of integrated circuits. Moreover, this minia-turization increased the functionality per unit area within a die, allowing devicesto provide enhanced computing capabilities. However, in the deep submicrometerera, where the pace of scaling is decreased, the approach of embedding morefunctions into a circuit is constrained by the size of the device.

At system level, advanced architectural techniques are adapted to improve theperformance and power of these complex circuits. However, design issues arise

19

20 CHAPTER 1. INTRODUCTION

as more silicon area and routing resources are required by these techniques. Forexample, significant effort has been placed to incorporate as many processor unitsas possible into a circuit over the last decade. This approach, however, has causedsignals to require more than one cycle to propagate across cores due to the longinterconnects [4]. Considering that the scaling of wires is not proportional to thescaling of transistors, the performance of modern integrated circuits is constrainedby the delay of interconnects rather than the intrinsic delay of the gates [5], [6].In addition, over 50% of the dynamic power of a microprocessor is dissipated oninterconnects [7]. Hence, power issues due to the long interconnects are importantin modern integrated circuits, limiting the efficiency of battery operated devices.

Another source of advancement in integrated circuits comes from packagingsolutions, where discrete ICs are incorporated into a package to form a system.The invention of the surface laminate circuit (SLC) technology by IBM 25 yearsago [8] was a key factor to enable packaging of circuits in higher densities than onprinted circuit boards (PCB). This situation led to the formation of System-on-Package (SoP), where different integrated circuits are placed and interconnectedon a package substrate. However, the supported packaging density from thistechnology is inadequate to meet the requirements of modern systems in terms ofinput/output terminals (I/Os) [9]. In addition, the wires employed to interconnectthe ICs in the package are orders of magnitude longer than the on-chip wires,resulting in power and performance issues [10].

An alternative approach to tackle these issues is to stack the integrated circuitsinto a package. Various packaging approaches have been proposed and are inproduction for several years reducing the length of interconnects between diceon a package substrate. These approaches include System-in-Package (SiP),Package-on-Package (PoP), and Wafer-Level-Packaging (WLP). Wire-bondingand ball-grid arrays are broadly used to provide vertical connections between ICsinto a package. However, wire bonding suffers from scalability issues and is limitedonly to the periphery of the chip [2], thus not providing significant increase ininterconnect density. In addition, ball-grid arrays are less cost-efficient comparedto wire bonding [11]. Moreover, even though WLP offers higher interconnectdensity as compared to SiP and PoP technologies, this density is limited by thesmall number and large size of bond-pads [11].

As a result, communication bottlenecks in terms of power and bandwidthbetween dice within a package appear in modern devices due to the interconnects.

1.2. VERTICALLY INTEGRATED CIRCUITS 21

Therefore, a new set of solutions is required to overcome the interconnect relatedproblems of modern microelectronic devices at both the circuit and package level.

1.2 Vertically Integrated Circuits

Vertical integration is a class of system integration technologies that has emergedto provide small form factor and high density of integration for modern circuits.An example system of these technologies is depicted in Fig. 1.1. 2.5-dimensionalintegration by means of interposer technology is a new layer in the packaging hier-archy for integrated systems, interconnecting the conventional package substrateand the host dice into a system. With three-dimensional (3-D) integration, circuitdice are placed one over the other and interconnected vertically, typically withthrough-silicon-vias (TSV).

Interposer

3-D ICs

Die 1

Die 2

Die 3

Die 4

2.5-D ICs

Figure 1.1: Vertical integration technologies.

The main advantage of these technologies is the increased manufacturing yield1.Traditional large planar dice can be divided into smaller ones and interconnectedeither vertically or with an interposer. This yield enhancement is enabled by thesevertical integration technologies due to the support of short and fast connectionsbetween dice. Interposer technology offers short off-chip interconnect lengths,thereby resulting in improved communication bandwidth and power consumptionas compared to older packaging technologies. Alternatively, three-dimensionalintegration offers shorter on-chip interconnect lengths. As a result, the totalwirelength within a circuit reduces and traditional timing and power bottlenecksdue to the interconnects can be alleviated [2].

1In semiconductor manufacturing, yield is the percentage of integrated circuits in a fully processedwafer that pass all electrical and functional tests.


However, as emerging technologies, new challenges arise. Systems with verticalintegration technologies comprise a number of dice of different materials, size,and functionalities. Therefore, mechanical stability to fabrication processes andreliable electrical connections between dice are two major challenges for thesesystems. In addition, due to the complexity of these structures, new testingprotocols are required for identifying faulty dice without increasing the dedicatedcircuitry for test. The increased packaging densities of active circuitry offered bythese technologies lead to increase power densities and, consequently, potentialthermal issues. Therefore, hotspots should be carefully managed in these systemsto not degrade the performance of the individual dice neither speedup the wear-outprocess of the whole system.

In this thesis, emphasis is placed on the effects of vertical integration technolo-gies to the circuit design process. The diverse characteristics and configurations ofvertically integrated circuits are investigated to evaluate and exploit the benefitsfrom these technologies for modern integrated systems. In addition, new method-ologies and tools are proposed to assess traditional objectives in circuit design,such as performance and area, and particularly improve power, for verticallyintegrated circuits. The main contributions of my research in this exciting areaare presented in the following section.

1.3 Contributions and Publications

My contributions and publications are summarized as follows:

• H. Kalargaris and V. F. Pavlidis, “Interconnect Design Tradeoffs for Siliconand Glass Interposers,” Proceedings of the IEEE International New Circuitsand Systems Conference, pp. 77-80, June 2014.In this work, design objectives for interconnects on different interposermaterials are investigated. Simulations demonstrate that wires on glassinterposers is a superior alternative to silicon interposers in terms of powerand delay. However, design parameters, such as the width and spacingbetween wires, are shown to considerably vary between the two materialsaiming the same objective and/or design metric. Therefore, design guidelinesfor interconnects on glass and silicon interposers are determined, that satisfypower, delay, area, and crosstalk constraints.

1.3. CONTRIBUTIONS AND PUBLICATIONS 23

• H. Kalargaris, Yi-Chung Chen, and V. F. Pavlidis, “STA CompatibleBackend Design Flow for TSV-based 3-D ICs,” Proceedings of the IEEEInternational Symposium on Quality Electronic Design, pp. 186-191, March2017.In this work, a novel backend design flow is presented which enables designspace exploration for 3-D circuits with through-silicon-vias. The designexperience of using this flow is similar to a 2-D flow, as commercial 2-DEDA tools and a public academic 3-D tool are utilized in all of the stages.New steps are added to support the introduction of the third dimension andthe broad gamut of TSV technologies and bonding styles. Due to these newsteps, this flow enables the performance of the 3-D circuit to be evaluated byseamlessly performing STA with mature EDA tools instead of consideringthe longest inter-tier delay and wirelength prediction models. Furthermore,this is the first design flow for 3-D ICs with TSV, which is compatible withmulti-mode power analysis while considering the electrical characteristics ofvertical interconnects. Simulation results from several benchmark circuitsare offered to demonstrate the effectiveness of the proposed flow.

• H. Kalargaris and V. F. Pavlidis, “Voltage Scaling for 3-D ICs: when, how,and how Much?,” Microelectronics Journal Vol. 69, pp. 35-44, November2017.In this work, a new approach is presented to decrease power in three-dimensional circuits by combining the innate traits of 3-D integration withstandard low power methods for integrated circuits, such as voltage scal-ing. An enhanced timing model for circuit paths based on logical effort ispresented to address the tradeoff between voltage and interconnect lengthreduction. In addition, guidelines are offered to identify early in the de-sign process if two-dimensional circuits can benefit from voltage reductionwith three-dimensional integration without compromising performance. Amethodology for applying and evaluating voltage reduction in 3-D ICsby utilizing EDA tools is also presented. Simulation results from severalbenchmark circuits are offered to support the effectiveness of the proposedapproach and methods.

• H. Kalargaris, J. Goodacre, and V. F. Pavlidis, “Advanced Circuit Inter-face for Systems with Multiple Voltage Domains,” Proceedings of the IEEE


International Ph.D. Research in Microelectronics and Electronics Conference,June 2016.In this work, an advanced interface circuit for systems with multiple voltagedomains (MVD) is presented. This circuit supports by-passing functionalityof the additional circuitry required from low power techniques, such asvoltage scaling and power gating in both 2-D and 3-D circuits. Traditionaltiming bottlenecks at the boundaries of the blocks are appeased in highperformance conditions for MVD systems by employing the proposed circuit.Moreover, simulation results are offered for various design objectives, suchas performance, power, and area.

• S. Ladenheim, Y.-C. Chen, H. Kalargaris, M. Mihajlovic, and V. F.Pavlidis, “Computationally Efficient Standard-Cell FEM-based ThermalAnalysis,” Proceedings of the ACM/IEEE International Conference onComputer-Aided Design, (accepted).The aim of this work is to provide accurate gate level thermal analysiswith low computational cost. An advanced mesh generator for gate levelfloorplans is introduced in this work to capture power/thermal variationsin fine circuit granularities. Novel algorithms are proposed to decrease theruntime of the simulations, based on the cells location and/or power densityinformation. Simulation results demonstrate significant runtime speedup ofthermal analysis while maintaining high accuracy by utilizing the methodsdescribed in this this work.This work is beyond the scope of my PhD, therefore it is not discussed inthis thesis.

1.4 Thesis Organization

The remainder of the thesis is organized as follows:

• In Chapter 2, the fundamental concepts and design challenges of verticalintegration technologies are presented. Moreover, as this thesis emphasizesthe gains in power for 3-D ICs, power issues and related low power methodsare also reviewed in this chapter.

• In Chapter 3, my work [12] on interconnect design for different interposermaterials is presented.

1.4. THESIS ORGANIZATION 25

• In Chapter 4, a backend design flow for three-dimensional circuits withthough-silicon-vias is presented, based on my work in [13].

• In Chapter 5, a new approach and the necessary means to reduce powerin 3-D ICs by combining the innate traits of 3-D integration with voltagescaling are presented, based on my work in [14].

• In Chapter 6, a new circuit is described for interfacing blocks with differentpower supplies, based on my work in [15].

• In Chapter 7, the thesis is concluded and future research topics are discussed.

Chapter 2

Emerging Technologies forIntegrated Circuits

Semiconductor industry has been able to increase the performance and decreasethe power of integrated circuits by scaling the transistors for the past severaldecades [3]. This approach has led to increased integration densities in planar(2-D) circuits demanding longer interconnects. As a result, power and performancebottlenecks due to the long interconnects have become a pressing issue in moderncircuits [2]. Vertical integration technologies have been introduced as a promisingcandidate to improve circuit power and performance by reducing the interconnectlength. In this chapter, background literature on technologies for vertical integratedcircuits and low power techniques is presented. An overview of 2.5-dimensional(2.5-D) integration is offered in Section 2.1. Three-dimensional (3-D) integrationtechnologies are presented in Section 2.2. As this thesis emphasizes the gains inpower for 3-D ICs, power issues and related low power methods are also reviewedin this chapter. The power consumption of digital circuits alongside the low powertechnique of voltage scaling are discussed in Section 2.3. Finally, a summary isoffered in Section 2.4.

2.1 2.5-Dimensional Integration

Technology scaling to smaller nodes favors logic density. Interconnecting highdensity circuits, such as a processor and a memory, is crucial requiring a largenumber of input/output terminals (I/Os). This situation is due to the fact that thecommunication bandwidth is dictated by the number of I/Os since a higher number

27

28 CHAPTER 2. EMERGING TECHNOLOGIES FOR ICs

of I/Os leads to higher bandwidth. However, as the International TechnologyRoadmap for Semiconductors (ITRS) predicts, the capacity gap between logicgates and I/Os will grow over the years (see Fig. 2.1), [16]. In addition, for mobilesystems, a 10× increase in power over the next decade is expected while 30-50% ofthis power will originate from the I/Os [17]. Therefore, this situation can severelyconstrain high density circuits.

2005 2010 2015 2020Year

100

101

102

Cap

acity

Rel

ativ

e to

200

5 GatesBumpsPackage Pins

Figure 2.1: ITRS prediction of 15× increase in logic-to-I/O ratio by 2020 [16].

In order to alleviate this situation, 2.5-dimensional integration, realized byinterposer technologies has emerged to support higher integration densities as com-pared to organic packages and printed circuit boards (PCB). In Section 2.1.1, thecharacteristics and advantages of interposer technologies are presented. Differentmaterials for interposers are discussed in Section 2.1.2.

2.1.1 Interposer Technologies

A stepping stone to full 3-D integration as predicted by experts and semiconductorfoundries is 2.5-D integration based on silicon or glass interposers. In late 2013,Xilinx and TSMC jointly announced the production release of the Virtex-7 HT

2.1. 2.5-DIMENSIONAL INTEGRATION 29

family of FPGAs by employing an interposer technology [18]. In addition, AMDand NVIDIA released interposer based graphic processor units (GPUs) in the thirdquarter of 2015 and 2016, respectively [19], [20]. The basic concept of interposertechnologies is illustrated in Fig. 2.2.

InterposerTechnology

PackageSubstrate

Printed CircuitBoard (PCB)

Die 1 Die 3Die 2

Figure 2.2: Schematic of an interposer based integrated system.

Interposer technologies introduce a new layer in the packaging hierarchy forintegrated systems, interconnecting the package substrate and the host dice in asystem. The main advantage of this technology is the increased yield. Interposersenable the integration of several small dice at advanced process nodes by supportingfast connections among these dice, which leads to a higher overall yield as comparedto a single die of considerably larger area [21]. In addition, the utilization ofthis technology enables semiconductor foundries to reduce the time-to-marketsince integrated circuits can be fabricated separately and interconnected later oninterposers [17].

The cross-section of an interposer with its components is depicted in Fig. 2.3.High density integration is supported with this technology, as smaller interconnects,vias, and bumps are utilized compared to older package technologies. Microbumps(µbumps) employed between the I/Os of the hosted dice and the interposer arean order of magnitude smaller than the C4 bumps from the package substrate[17]. Furthermore, vias in interposers follow the same trend resulting to denserintegration.

Interposers typically include a number of horizontal (planar) metal layers,called redistribution layer (RDL), interconnecting the hosted dice (see Figs. 2.2


Figure 2.3: X-ray cross-section of interposer and its components [22], [23].

and 2.3). The RDLs are similar to Back-End-of-Line (BEOL1) interconnects.An older interconnect technology is employed for the layers as compared to theon-chip interconnects of the hosted dice due to the lower resistance and cost [24].

The high interconnection densities supported by this technology result in higherbandwidth and lower power consumption as compared to traditional packages. Auseful metric to quantify this improvement is the power efficiency of interconnects,defined as the ratio of power over bandwidth (measured in units of W/Gbps).M. A. Karim et al. [25] showed that the utilization of silicon interposer increases4× the power efficiency of package-level interconnects as compared to standardprinted circuit boards (PCB). In addition, an improvement of 2× over Package-on-Package (POP) technology is observed (see Fig. 2.4). These results demonstratethe effectiveness of interposer interconnects to transfer data faster and with lesspower than traditional package technologies.

Interposers can be utilized as a passive package technology containing noactive circuits. This characteristic enables the utilization of non-semiconductormaterials as interposers. An alternative candidate material to silicon is glass. Inthe following section, the main characteristics of silicon and glass interposers aredescribed.

1BEOL includes the fabrication steps where the metal layers of an integrated circuit are deposited.

2.1. 2.5-DIMENSIONAL INTEGRATION 31

0 5 10 15 20Power Efficiency [mW/Gbps]

Interposer +

POP +

PCB + DDR3

LPDDR2

Wide I\O

Figure 2.4: Silicon interposers versus conventional package-level interconnects, [25].

2.1.2 Silicon vs Glass Interposers

Xilinx, AMD, and NVIDIA employ silicon interposers in the Virtex-7 FPGA family,Fiji and Vega, and Pascal and Volta GPU series of products, respectively [18]-[20].An alternative of using silicon is the glass interposer where vias of similar diameterand pitch and high density wiring can be supported. In Fig. 2.5, the benefits ofusing glass instead of silicon interposers are summarized [26], [27].

Reusability of existing fabrication tools and processability of crucial com-ponents, such as the vias, drive the present production line in favor of silicon(wafer-Si) interposers compared to glass (see Fig. 2.5). Silicon interposer tech-nologies are appealing since silicon wafer processing tools are reused, resulting incheaper and easier to fabricate passive silicon interposers. However, this processis available only up to 300 mm wafers, making it costly for packaging applicationsdue to the limited manufacturing throughput [27]. On the contrary, glass, awell-known material in display industry, can be utilized to both 300 mm and 450mm wafers and also in panels making it an attractive alternative [27]. In addition,glass appears as a promising solution for high density applications considering thelower cost per I/O than silicon.

Thermal management is another important issue in packaging technologies.Silicon has higher thermal conductivity (148 W/mK) as compared to glass (1.14W/mK). This characteristic of silicon enables heat to spread more effectively than


Figure 2.5: Comparison of silicon and glass characteristics used as interposersubstrates in 2.5-D integrated systems [26], [27].

in glass. As a result, the circuit temperature can be decreased by 4-5 degreescentigrade with silicon interposers [17]. However, potential thermal issues on glassinterposers can be tackled adequately by utilizing a large number of copper viasto spread heat [28].

From an electrical perspective the properties of glass, such as low loss andhigh resistivity, make glass an excellent candidate for low power and high signalintegrity circuits. Silicon as a semiconductor material exhibits high relativepermittivity (er = 11.9) [17]. Employing copper through-silicon-vias (TSV) insuch interposers requires an additional process step of lining with silicon dioxide(SiO2). In addition, due to the high self and coupling capacitance of TSVs, signalpropagation exhibits high delay and power and overall poor signal integrity [17].Alternatively, glass, as an insulator, has a very low permittivity (er = 3.4) [29].This characteristic allows the formation of through-glass-vias (TGV) withoutthe need of an insulator lining. As a result, propagating signals through vias inglass interposers is a superior alternative to silicon for high density applicationsdemanding high signal integrity [29].

The most wire-dominant part of an interposer is the redistribution layer. Thissituation is demonstrated in the Xilinx inteposer-based products, where morethan 10,000 horizontal wires are employed in the RDL to interconnect the hosteddice [30]. This high density wiring alongside the long length of these planar

2.2. THREE-DIMENSIONAL INTEGRATION 33

wires (orders of magnitude longer than vias) affect significantly the power andperformance of the entire design [25]. Therefore, investigating the effect of differentinterposer materials on the performance of wires in the RDL is an importanttask. My published work [12] sheds light on the different design tradeoffs forinterconnects in the RDL due to the different material characteristics of siliconand glass interposers. Guidelines for designing interconnects on glass and siliconinterposers that satisfy area, power, delay, and crosstalk constraints are offered.This work is presented in Chapter 3.

2.2 Three-Dimensional Integration

Interposer technologies offer a large number of advantages over traditional packag-ing technologies, such as shorter off-chip interconnect lengths, increased packagingefficiency, and higher I/O density. These advantages provide significant power andperformance improvements at package level integration. However, the increasingpower and delay due to the long on-chip interconnects in modern circuits is onlypartially alleviated with 2.5-dimensional integration. Therefore, the advantages ofvertical integration are not fully exploited.

A promising solution and an important next step in the evolution of mi-croelectronics and, especially, in the area of interconnects is three-dimensionalintegration, where circuits are placed one over the other. The primary gain ofthree-dimensional integration is the considerably shorter interconnect length. Atypical metric to determine the interconnect length is based on the Manhattandistance, where the longest interconnect is taken as twice the sidelength of thecircuit [31], [32]. This situation is depicted in Fig. 2.6 through a simple example(from [2]) of a circuit where wires are only routed in x and y directions. A squarecircuit with area A has maximum Manhattan wirelength 2

√A, but partitioning

the same circuit into two tiers reduces the wirelength to 2√A+ lz, where lz is

the length of the vertical connection between tiers. In general, n tiers lead toa maximum Manhattan wirelength of 2

√An + (n− 1)lz. As the term lz is on

the order of tens of micrometers (µm), while the 2√A factor is on the order of

millimeters (mm), L′′max will be significantly shorter than Lmax (see Fig. 2.6).Despite the simplicity of this example, the decrease in wirelength in 3-D ICs hasbeen demonstrated in [33] to follow a similar behavior.


Area = A

Lmax = 2p

A

(a)

Area = A

L0max =

p2p

A + lz

(b)

n tiers

Area = A

L00max = 2

qAn + (n 1)lz

(c)

Figure 2.6: Manhattan wirelength of (a) a 2-D circuit, (b) a two tier 3-D circuitand (c) a n tier 3-D circuit.

There are several types of three-dimensional integrated circuits, depending onthe supported vertical connections between the stacked circuits. These forms of3-D ICs are presented in Section 2.2.1. In this thesis, emphasis is placed on 3-Dcircuits where through-silicon-vias are utilized to interconnect and bond the tiers.The different TSV technologies and bonding styles are discussed in Section 2.2.2.The benefits and challenges of TSV based three-dimensional integrated circuitsare presented in Sections 2.2.3 and 2.2.4, respectively.

2.2.1 Types of 3-D Integration

3-D ICs can be realized with a sequential or parallel manufacturing process. Inthe former case, the devices are grown in layers on a single semiconductor wafer,resulting in a purely monolithic system. Alternatively, some ICs are preparedseparately, before the bonding process and are bonded to form a multi-tier(polylithic) 3-D system. In this case, vertical connectivity between tiers is ensuredeither contactlessly or with TSVs. These diverse types of three-dimensionalintegration are discussed in the following paragraphs. In addition an overview of3-D technologies is listed in Table 2.1.

Monolithic 3-D ICs

The main two types of monolithic circuits are stacked 3-D ICs and fin-FETs.Layers of planar devices successively grown on traditional CMOS or silicon-on-insulator (SOI) wafers lead to stacked 3-D ICs. The main advantage of thistechnology is the reduction of the footprint area of the gate as the devices that


Table 2.1: Brief overview of 3-D technologies.

ManufacturingProcess Technology Benefits Limitations

Monolithic

Stacked3-D ICs

• Reduced gate area• No aligning, thinning

bonding issues

• Quality oftransistors

• Heat issues

3-D fin-Fet• Reduced routing

congestion• Reduced gate area

• Ability to controlchannel

• Complex ESDmethods

Polylithic

Contactless • Die detachability• Low Cost

• AdditionalCircuitry

• Crosstalk noise• Power overhead

TSV-based3-D ICs

• High Density• Reduced wirelength• Variety of TSV and

bonding technologies

• Non-negligibleTSV capacitance

• Silicon Areaoverhead

• Heat issues

comprise a logic gate can be located on different layers. Considering a CMOStechnology in a 3-D stacked system, the PMOS device is often placed in thebottom device layer, whereas the complimentary device (NMOS) in the upperdevice layer [34], [35] (see Fig. 2.7(a)). In addition, there is no need for aligning,thinning, and bonding as there is only one substrate. The main limitation of thisintegration approach is the quality of the grown transistors on the upper devicelayers which need to exhibit satisfactory electrical properties, such as leakagecurrents, field mobility, and threshold voltage [36]. In addition, the manufacturingprocess should not degrade the electrical characteristics of the bottom layer dueto high temperatures [2].

An alternative approach for 3-D circuits is the utilization of all-around-gates,such as 3-D fin-FETs based on quasi-planar fin-FETs [37]. A 3-D fin-FET CMOSinverter is illustrated in Fig. 2.7(b) [38], [39]. The main advantage of this technologyis the reduction in gate area by 50% as compared to the traditional planar CMOSintegration [39]. In addition, due to the vertical stacking of the MOSFET, the


(a) (b)

Figure 2.7: Monolithic 3-D ICs; (a) complimentary MOS device in differentlayers [2] and (b) 3-D fin-FET CMOS inverter [38].

routing resources to connect the devices within the gate are decreased, resultingin low power and high performance [38].

Contactless 3-D ICs

In a polylithic 3-D IC, tiers are fabricated separately and then interconnectedvertically. One way to provide communication between circuits located in differ-ent tiers is through coupling of the electric or magnetic field. In signaling withcapacitive coupling, small on-chip parallel plate capacitors are utilized as depictedin Fig 2.8(a) [40]. The transmitter consists of a buffer driving the capacitor. Thereceiver circuit is more complex required to be sufficiently sensitive and fast todetect and respond to the voltage swing being transferred through the couplingcapacitors. The size of the capacitor is highly important affecting the inter-tierinterconnect density. In addition, the inter-tier distance and the dielectric con-stant of the material between the tiers are two other factors which determine theefficiency of this integration scheme.

(a) (b)

Figure 2.8: Contactless 3-D ICs based on (a) capacitive [40] and (b) inductivecoupling [2].


Inductive coupling is another technology for contactless 3-D circuits wherecommunication between tiers is supported through magnetic field coupling. Thismagnetic field is enabled by placing inductors in different tiers at the samehorizontal coordinates (see Fig. 2.8(b)). In addition, specialized circuitry isrequired for both the transmitter and receiver [41]. The transmitter differentiatesa signal and the receiver amplifies the transmitted current or voltage pulses to afull swing signal. The main limitation of this integration stems from the large sizeof the inductors which leads to a low inter-tier interconnect density. Other issuesrelate to the power dissipated by the additional circuitry and crosstalk noise fromadjacent on-chip inductors [42].

TSV Based 3-D ICs

In this integration scheme, vertical connectivity between tiers is providedby through-silicon-vias (TSVs) and microbumps (µbumps). A schematic represen-tation of this technology is illustrated in Fig. 2.9. Each tier of a TSV based 3-Dsystem can be manufactured separately resulting in a polylithic structure. Thischaracteristic supports the integration of circuits with different processes and ondifferent types of wafers. Therefore, ICs can be diversified more than if they weregrown on a single wafer. Moreover, components with incompatible manufacturingprocesses can be combined within a single 3-D IC and interconnected with TSV.In addition, as each tier of the 3-D stack is fabricated individually, the yield ofeach IC can be high and the overall manufacturing cost can be reduced.

Tier 1

Tier 2

Tier 3

Wiring (BEOL)

Transistors(FEOL)

TSVs

Microbumps

Figure 2.9: 3-D integrated system with TSV and µbumps.


The most attractive feature of this technology is the high density of shortand thin TSV, placed at any point across the circuit not occupied by transistors.The usage of these high density vertical connections offers the greatest possiblereduction in wirelength with three-dimensional integration. The formation ofthese inter-tier wires is an important issue for designing low power and/or highperformance 3-D IC, as TSV affect the surrounding transistors and horizontal wires.Various TSV technologies and wafer/die bonding styles are available to addressthis issue. In the following section, the different bonding styles and characteristicsof TSV are discussed. In addition for the rest of this thesis the terms “3-D stack”,“TSV based 3-D ICs”, and “3-D ICs” will be used interchangeably to refer tothree-dimensional integration with TSV and/or microbumps.

2.2.2 TSV Characteristics and Bonding Styles

As mentioned in the previous section, through-silicon-vias are important compo-nents for three-dimensional circuits. These vias, as conductive materials, electri-cally connect the tiers within a 3-D stack. A diversity of TSV technologies areavailable for interconnecting vertical circuits. A common classification of TSV isbased on the fabrication process during which the TSVs are formed [11], [43]. InFig. 2.10, various TSV technologies are illustrated based on this classification.

BEOL

TSV

FEOL

(a) Via-First

TSV

FEOL

BEOL

(b) Via-Middle

FEOL

BEOL

TSV

(c) Via-Last

Figure 2.10: Main fabrication steps of different TSV technologies.


Via-first (VF) TSV processes are typically formed before the Front-End-of-Line(FEOL), where transistors are fabricated (see Fig 2.10(a)). In via-middle (VM)TSVs, firstly the transistors (FEOL) are fabricated and the TSV formation processfollows (see Fig 2.10(b)). The main advantage of these two processes (VF and VM)is that a high density of vertical connections is supported due to the small size ofthese vias [2]. However, area otherwise used for transistors is occupied, resultingin an overhead in area [11]. Via-last (VL) TSVs are formed after the BEOL, wherethe horizontal interconnects are manufactured (see Fig 2.10(c)). Via-last TSVsrequire fewer steps to be formed, leading to lower cost as compared to VF andVM [11], [44]. However, these vias interfere with both the transistors and horizontalwires, resulting in overhead in area and increased routing congestion [45].

The main purpose of these vias is to provide connections between tiers whileexhibiting low electrical characteristics, such as resistance and capacitance. Inorder to quantify these properties, electrical models are necessary to accuratelydescribe the interconnect power and speed. Through-silicon-vias are formed byetching the substrate and filling the etched regions with metal, such as tungsten(W) or copper (Cu) [2]. Furthermore, a thin lining with silicon dioxide (SiO2, orsome other material) is required due to the copper diffusivity into silicon [46], [47].As a result a cylindrical metal-oxide-semiconductor (MOS) capacitor is formedwith the silicon substrate acting as the bulk and the TSV metal acting as a gate(see Fig. 2.11). G. Katti et al. observed this situation and provided a closed-formexpression for calculating the capacitance of TSV [48]. In addition, a closed-formexpression for the resistance of a TSV is provided in [48] based on cylindricalwires.

SubstrateP-Si

MetalCu/W

InsulatorSiO2

(a)

TSVLength TSV

Diameter

LinerThickness

(b)

Figure 2.11: (a) Footprint and (b) cross-section of a TSV structure.


A plethora of TSV with significantly different aspect ratios2 and therebydifferent electrical parameters has been reported [2], [11]-[44], [49]-[51]. Via-last TSVs are typically large in terms of diameter and with high aspect ratio.These TSVs exhibit low resistance and high capacitance and usually are utilizedin interposers and image sensor circuits [11]. Alternatively, via-first and via-middle TSVs are shorter and thinner than VL favoring high density verticalintegration [2]. Moreover, microbumps, utilized as contacts between tiers, can befabricated following the same small sizes of VF/VM with negligible impedance ascompared to TSVs [11].

Increasing the number of tiers in a 3-D system is a typical objective for three-dimensional integration [2]. After the formation of TSV and transistors into asingle wafer follows the tier stacking step. This step proceeds with either wafer-to-wafer (W2W), die-to-wafer (D2W), or die-to-die (D2D) bonding. W2W bondingrequires fewer process steps than D2W and D2D [11]. However, this approach canresult in low yield, as faulty dice are not excluded. In order to avoid this situation,a wafer can be diced and test each die separately providing only known good dice(KGD) for bonding. After this step, which increases the turnaround time [2], diceare bonded to wafers (D2W) or to dice (D2D) in order to form the 3-D stack.

In Fig. 2.12, different bonding styles are illustrated. The most common bondingis face-to-back (F2B) where the BEOL (face) of the bottom die is attached to thesubstrate (back) of the top die (see Fig. 2.12(a)) [11]. This approach allows theintegration of virtually unlimited tiers in a 3-D stack. In addition, all types andsizes of TSV can be used to interconnect the circuit tiers, resulting in high densityintegration [52]. In face-to-face (F2F) bonding, the BEOL of each die is alignedand bonded utilizing only microbumps (see Fig. 2.12(b)). The main advantage ofthis bonding style is that TSV is not required, thus there is not an overhead inarea [53]. However, F2F bonding can be utilized only if two tiers are integrated [2].Back-to-back bonding, where the substrate of each tier is attached, is also possible(see Fig. 2.12(c)). However, this approach results in longer TSVs [2].

2.2.3 Benefits of TSV Based 3-D ICs

Three-dimensional integration by means of TSV and microbumps can providesignificant benefits over the traditional planar integration. The most importantenhancement of this technology is the considerable reduction of wirelength due2Aspect ratio is defined as the TSV length to diameter ratio (see Fig. 2.11)


BEOL

FEOL

(a) Face-to-Back

BEOL

BEOL

(b) Face-to-Face

FEOL

FEOL

(c) Back-to-Back

Figure 2.12: Different bonding styles of tiers.

to the introduction of the third dimension. This advantage has been shown toimprove the performance and power of digital circuits, such as processors andmemories [54]-[61].

Splitting the logic blocks of a processor to several tiers is a meaningful way toimprove the performance and power by reducing the wirelength within each block.For example, considering the architecture of a processor, the instruction schedulerconstrains the highest clock frequency and dissipates considerable power [62].Folding this block in two tiers results in lower delay and power by 44% and 16%,respectively [54]. Furthermore, arithmetic units and logarithmic shifters can befolded into different tiers. Delay and power improvements for Brent Kung [63]and Kogge Stone [64] adders are demonstrated in Table 2.2. However, the savingssaturate with the number of the tiers. Therefore, the number of the tiers mustcarefully be determined to sustain performance improvement and low cost.

Table 2.2: Performance and power improvements of 3-D over 2-D logic circuits [54].

Kogge-Stone Adder Brent-Kung Adder16-bits 32-bits 32-bits

Delay Power Delay Delay2 tiers 20.23% 8% 9.6% 13.3%3 tiers 23.60% 15% 20.0% 18.1%4 tiers 32.70% 22% 20.0% 21.7%

Alternatively, another approach simply stacks the circuit blocks and reducesshareable wires. Black and Intel™ developed a 3-D version of the Intel™ Pentium®

4 in two tiers [55] with face-to-face bonding and microbumps. This structurerequires only 50% of the original footprint and reduces inter-block interconnects.


A typical worst case path is when the data travels from the far edge of the datacache (D$) to the farthest functional unit (FU). This path, in the 3-D version ofPentium® 4, contains half the routing distance since the data only traverses halfof the data cache and half of the functional units, effectively eliminating 1 clockcycle of delay in the load execution delay [55]. This situation and the reductionof the original footprint are illustrated in Fig. 2.13. Furthermore, this structureeliminates 25% of the pipeline stages in the 2-D architecture and improves theperformance by 15% [55]. In addition, the power dissipation is also reduced by15%. A similar approach for the Alpha 21364 [65] processor with face-to-backbonding and TSV resulted in a 7.3% and 10.3% increase in the instructions percycle (IPC) for two and four tiers, respectively [57].

(a) (b)

Figure 2.13: Floorplan of x86 microprocessor (a) 2-D and (b) 3-D [56].

Three-dimensional integration can also offer several enhancements for cachememories. The first advantage of the vertical integration for cache memories stemsfrom increasing the total memory by adding more memory on the upper tiers ofa 3-D stack. This technique results in fewer cache misses due to the increase intotal memory. This situation is illustrated in Fig. 2.14, where a new tier of L2cache memory was introduced, resulting in a increase in the total memory by 8MB [56]. Another advantage of manipulating cache memories in 3-D integrationis derived by partitioning the existing cache in two or more tiers. Although thismethod does not have significant impact on the miss rate, a reduction of globalwires, such as the clock network is obtained. Furthermore, in this case, othercircuit blocks can be rearranged differently, resulting in a reduction in wire delaythroughout the overall system [56] (see Fig. 2.13)

Three-dimensional integration with TSV also supports faster cache memories.This objective can be achieved by constructing novel cache architectures withshorter interconnects inside these blocks [58], [59]. These architectures are based


Core #1 Core #2

Cache4 MB

(a)

Core #1 Core #2

Cache4 MB

Cache 8 MBSRAM

(b)

Figure 2.14: Memory stacking options: (a) 4 MB baseline and (b) 8 MB stackedfor a total of 12 MB [56].

on splitting the memory block at different levels of abstraction to more thanone tier. The memory can be partitioned into different tiers at the circuit block,macrocell, and transistor level, where the circuit block is considered in this case tobe equivalent to a memory subarray. Vertical integration at extra fine granularity,such as at the transistor level, has been shown to have a negative effect on thetotal area of the memory, since the size of a TSV is typically larger than the areaof an SRAM cell [66]. Therefore, the most common partitioning for 3-D cachememories with TSV constitutes of splitting either the word line or the bit line ofcache memories [60]. In general, word line partitioning in 3-D results in a smallerdelay but not necessarily larger power savings [58]. Greater savings in both delayand power are noticeable in high performance caches in 3-D [59]. Alternatively, bitline partitioning is more efficient for low power memories. When the memory isdesigned for low power, bit line partitioning decreases the power by approximately14% as compared to the reduction achieved by word line partitioning [59]. Thisbehavior can be explained by considering the original 2-D design of the cachememory. High performance memories favor wide arrays, which entail longer wordlines, while low power memories exhibit a greater height, resulting in longer bitlines [2].

SMART-3D [61] is another technique which re-architects the memory hierarchyfor 3-D circuits. This method re-architects the L2 cache and the cache interfaceto the 3-D stacked DRAM to improve latency by exploiting the high densitybandwidth of TSV between the last level cache of the processor and the 3-DDRAM. The maximum usable memory bandwidth is achieved with only a fewhundred through-silicon-vias between the processor and memory tiers (e.g., 512


TSVs for a 64 bytes L2 cache line) [61]. This implementation increases thespeed, for single-threaded memory-intensive applications, from 1.53 to 2.14 timescompared to a conventional 2-D architecture and from 1.27 to 1.72 times comparedto other 3-D stacked memory techniques [61]. A summary of different approachesfor splitting homogenous circuits in three dimensions based on previous works islisted in Table 2.3.

Table 2.3: Summary of different approaches for splitting homogenous circuits inthree dimensions based on previous works.

Work 3-D approaches[54] Folding logic blocks

[55], [57] Stacking logic blocks[56] Adding cache memory dies

[58], [59] Re-architecting cache memories[61] Re-architecting memory hierarchy

2.2.4 Challenges in 3-D ICs

Three-dimensional integration with through-silicon-vias offers integrated circuitswith smaller form factors. However, new design and manufacturing techniques inthese complex stacked systems are required to realize and exploit the full potentialof this technology. This situation makes three-dimensional integration by meansof TSV an exciting research area. The main challenges and potential limitationsof 3-D ICs are described in the following paragraphs.

EDA Design Tools

Modern digital circuits exhibit the same or even greater design complexitythan building a skyscraper or an airplane [67]. This situation has created thenecessity of utilizing computer-aided design (CAD) tools to advance to highercircuit integration densities. Problems in the design process of a circuit, such aslogic synthesis, partitioning and floorplanning, place and route, and power andtiming analysis, are efficiently handled by design tools. These tools are incorpo-rated to a design flow to automate the design process of a circuit. A typical ICdesign flow starts from the behavioral description of the circuit and finishes withthe layout which is provided to the semiconductor foundries for fabrication.


The complexity of three-dimensional TSV based integration is not a prohibitivefactor for software tools. However, existing electronic design automation (EDA)tools require to be extended to manage the vertical dimension of integration.For example, excessive usage of TSV can lead to increased wirelength as com-pared to planar circuits due to the large area of TSV and the silicon area thesestructures occupy [68]. Therefore, the number, size, and location of TSV arecrucial factors affecting the partitioning and floorplanning steps of the circuit intothree dimensions. Moreover, verifying the alignment of TSVs and microbumpsbetween adjacent tiers is another important factor [69]. In the course of placingand routing standard cells3, electrical connections need to be validated for signalsspanning multiple tiers. In addition, the RC parasitics of TSV and microbumpsmust also be considered during the power and timing analysis of 3-D stacks. Thestate-of-the-art tools for 3-D circuits are discussed further in Section 4.1.

Integrating all these tools, however, in a design flow is not a straightforwardtask. Consequently, there is not a complete design automation flow to supportthe design of TSV based 3-D ICs. Furthermore, considering the diverse TSVand bonding technologies, the lack of a complete design flow limits the designexploration of 3-D circuits. In my published work [13], a novel design automationflow is proposed which enables design space exploration for TSV based 3-D ICs.The design experience is similar to a 2-D flow, as commercial 2-D EDA toolsalongside an academic open source 3-D tool are utilized in the proposed flow. Inaddition, in order to support the broad gamut of TSV and bonding technologies,novel steps are introduced in this flow. For the first time, the RC parasiticsof vertical connections are considered to accurately determine the power andperformance of the whole 3-D stack. This work is presented in Chapter 4.

Manufacturing and Thermal Issues

The realization of 3-D systems requires mature manufacturing processes toconstruct these complex 3-D circuit structures. In the development of 3-D systems,dissimilar technologies may arise fabrication issues relating to the reliable bondingof several ICs [53]. Furthermore, the performance and reliability of the individual

3Logic or standard cells are low level logic functions such as AND, OR, INVERT, flip-flops,latches, and buffers. Logic libraries are collections of these cells, provided by semiconductorfoundries for the design process of the circuit.


tiers should not be degraded due to the vertical (heterogeneous) integration. Reli-able and small vertical interconnects to propagate signals and power throughoutthe tiers of a 3-D system needs to be developed with the fewest possible manu-facturing steps [2]. This issue is due to the fact that each additional fabricationstep increases the cost and potentially reduces the yield [70], [71]. Therefore,packaging solutions that accommodate these complicated 3-D structures need tobe developed for 3-D ICs to reach volume manufacturing.

Thermal effects are another crucial challenge for 3-D ICs. Stacking of tiersincreases the number of active circuits per unit area as compared to 2-D systems.This situation leads to increased power densities for 3-D circuits. As a result, heatissues, in the form of hotspots, arise for 3-D systems. Degraded performance andaccelerated wear-out of the circuit are two major effects due to the increased heatin 3-D ICs. Moreover, exploiting the performance benefits of vertical integrationwhile mitigating thermal effects is a difficult task [2]. New packaging solutionsand more effective heat sinks are two approaches to alleviate thermal effects. Inaddition, considering these thermal issues, research on reducing the power of 3-DICs becomes even more important and an exciting research area.

Limited Power Savings

Power consumption has been a traditional primary challenge for 2-D integratedcircuits [72]. Moreover, considering the immense growth of battery operateddevices (5× increase over the last 5 years [73]), such as mobiles, smart watches,and tablet PCs, three-dimensional circuits should exhibit low power characteristicsin order to successfully enter the market. In addition, considering the thermalissues in three-dimensional integration, reducing power is imperative for thesecircuits.

Power savings in 3-D ICs stem from the decreased interconnect capacitancedue to the reduction of the wirelength. However, through-silicon-vias exhibit non-negligible parasitic capacitance [48]. This characteristic can lead to moderate powersavings as compared to planar integration. In addition, due to this characteristic ofTSV, buffers insertion before a TSV is often inevitable, increasing power [74], [75].Consequently, the decrease in power offered by vertical integration can be severelyconstrained if only wirelength reduction is considered.

2.3. LOW POWER TECHNIQUES 47

Based on this observation, my work [14] follows a different yet efficient way todecrease power by combining the innate traits of 3-D integration with standardlow power methods for integrated circuits, such as voltage scaling. This newmethod is presented in Chapter 5. In the following section, the main componentsof power in modern integrated circuits alongside advanced low power techniquesare surveyed to allow the reader to better follow the material presented in thefollowing chapters.

2.3 Low Power Techniques

Devices, such as mobiles, tablets, wearables, and other emerging smart devices,are battery-operated. Furthermore, these devices require sufficient performance toenable fast data processing and high bandwidth communication. This situationcreates the necessity of delivering performance while consuming a minimal amountof energy to achieve a longer battery life time [77]. However, this tradeoffbetween power and performance is not new for the semiconductor industry [76].Traditionally, power optimization is one of the most important constraints inintegrated circuits and obtaining the best performance possible within the availablepower budget is often a challenge. In order to provide low power 3-D circuits, themain characteristics of power consumption in integrated circuits need to carefullybe explored. For this reason, the power components of integrated circuits alongsidevarious low power techniques are discussed in Section 2.3.1. In addition, emphasisis placed on the voltage scaling technique in Section 2.3.2.

2.3.1 Power Consumption of Integrated Circuits

The power dissipated in a modern system is defined as the sum of dynamic andstatic power

Ptotal = Pdynamic+Pstatic. (2.1)

The dynamic power arises from the charge and discharge of the capacitance ofhundreds of millions of gates in modern circuits [72]. This component of power is

Pdynamic = αCV 2ddf, (2.2)


where α is the fraction of gates switching, Vdd is the supply voltage, and f is theoperating frequency of the circuit. The total switched capacitance of a circuitis denoted by C, including the capacitance of gates and interconnects. Three-dimensional integration typically reduces the power of ICs by decreasing theinterconnect capacitance [2].

The static power (or off-state leakage) is due to the current that leaks throughtransistors when inactive and is described as

Pstatic = VddIleak, (2.3)

whereIleak = Isub+ Iox. (2.4)

As described in (2.4), the source of static power consumption, is a combination ofsubthreshold (Isub) and gate-oxide leakage (Iox) current. These components ofstatic power can be described from [78],

Isub =K1We−VthnVθ

(1− e−Vdd/Vθ

), (2.5)

Iox =K2W

VddTox

2e−aTox/Vdd , (2.6)

where the terms of interest are the oxide thickness Tox and the threshold voltageVth of a transistor.

The reduction of total power during the past four decades was primarilyachieved by decreasing the dynamic component of power and introducing newdevice technologies (e.g., switching from bipolar to CMOS technology). However,smaller geometries exacerbate leakage current, such that static power has becomea noticeable power component in microprocessors [78], [79]. Therefore, gate-oxide(Iox) and subthreshold (Isub) leakage currents should be constrained as thesecurrents do not contribute to any useful work.

The optimization of power consumption can be achieved at different levels ofthe design abstraction. Reducing the complexity of algorithms at application andoperating system level is a useful way to reduce power [80]. Moreover, power isdecreased at architectural level with techniques such as pipelining, data encoding,

2.3. LOW POWER TECHNIQUES 49

and parallelism of instructions [81]. Alternatively, a broad gamut of low powertechniques at physical level are available which can lead to great power savings.

Down sizing cells, clock gating, and power gating are three methods aimingto reduce the power of a circuit [82]. Reducing the size of a cell (width of thetransistor) leads to decreased gate capacitance. The utilization of this methodin a variety of circuits has demonstrated 15% power reduction on average [83].However, reducing the strength of gates leads to performance degradation [82].Clock gating reduces the dynamic power dissipation by disabling the clock signal inportions of the circuitry such that the related flip-flops do not switch. Dependingon the circuit and operating mode, this technique can offer up to 40% powerreduction [84]. Moreover, during power gating, blocks of inactive circuitry areturned off to reduced the leakage power [85].

More advanced methods, such as multi-voltage-threshold cells and voltagescaling, consider both the power and performance of the circuit. The supplyvoltage (Vdd) and the voltage threshold (Vth) relate to the delay of a circuit as

Delay ∝ Vdd/(Vdd−Vth)β , (2.7)

where β is an experimentally derived constant specific to the manufacturingtechnology (1< β < 2) [86]. Employing high voltage threshold cells (thick Tox) isa meaningful way to reduce the leakage current of a circuit considering (2.4)-(2.6).This method has demonstrated the reduction of the static power of a circuit upto 82% [87]. In addition, the speed of a circuit can be increased by utilizing lowvoltage threshold cells as described in (2.7).

Voltage scaling is one the most efficient techniques to decrease the power in acircuit while considering the speed. As described in (2.7), increasing the operatingvoltage leads to superlinear decrease in delay. Alternatively, significant powersavings are achieved by reducing the voltage supply [82] due to the quadraticrelation of voltage to the dynamic power (see (2.2)) and the linear dependencyupon the leakage power (see (2.3)). The following section is dedicated to thevoltage scaling method due to the importance of this technique in modern circuits.

2.3.2 Power Supply Voltage Scaling

Reducing the supply voltage can lead to significant power gains in modern circuits[82]. A tradeoff however exists between power and performance as a reduction in


voltage supply increases proportionally the latency of the circuit. Consequently,voltage scaling is always considered along with performance. This situation leadsto different strategies of voltage scaling: i) reducing the operating voltage whenthe power budget is constrained and ii) increasing the operating voltage for agiven frequency target when speed is the primary objective.

The technique of voltage scaling can be applied to different circuit granularities.At the system level, voltage changes globally across a circuit. Alternatively, voltagecan be scaled at the block level, individually adapting the voltage of functionalcircuits blocks (i.e. chip multiprocessors - CMPs) composing a system [86]. Inboth cases, the same strategies of voltage scaling are applicable in terms of voltageand speed.

Another categorization of voltage scaling schemes is based on the mechanismfor altering the voltage both at the system and block level. In multi-voltagescaling (MVS) schemes, few and fixed voltage levels are supported and the powermanagement unit (PMU) is responsible to manage these levels [82]. Similarly,dynamic voltage and frequency scaling (DVFS) is an extension of the MVS schemewhere a larger number of voltage levels are supported by the PMU and aredynamically switched to follow the dynamic behavior of the workload. In anadaptive voltage scaling scheme (AVS), a control loop is used to adjust the voltagelevel [88].

Furthermore, schemes of finer granularity include voltage islands/domainswhich are an extension of voltage scaling combined with the power gating technique,where blocks are switched off if inactive [89]. Blocks are physically grouped toform islands with the same voltage levels and switching activities. This extensionis useful for circuits with voltage scaling enabled at the block level as the powerdistribution network (PDN) is simplified [90]. In addition, thermal issues aremitigated as power relates to heat dissipation [91].

However, several challenges arise in the design process for circuits with multiplevoltage domains (MVS), notably at the boundaries of the blocks. The primarydifficulty in utilizing these techniques in a system is the necessity of additionalcircuitry at the interfaces between blocks which operate at different voltage supplies.Signals propagate between blocks that utilize different power rails. Therefore,additional circuits are required to scale up/down the voltage level of signals andretain the previous state or clamp signals to a specific state when a circuit blockis powered down [82]. The main disadvantage of utilizing this additional circuitry

2.4. CONCLUSIONS 51

between blocks is that the additional delay of these cells hinders the timing closureof the circuit. In addition, this problem applies to both 2-D and 3-D integratedcircuits [82], [92]. Therefore, in my published work [15], an advanced circuitinterface is offered by-passing these cells thereby alleviating timing penalties. Thiswork is presented in Chapter 6.

Furthermore, by considering the effectiveness of the voltage scaling techniqueon the well-known power-speed interplay, a methodology to reduce power in3-D ICs by decreasing the supply voltage but without degrading speed is highlyimportant. In my work [14], emphasis is placed on reducing the operating voltagein 3-D circuits, as power savings from the reduction of the wirelength are limiteddue to the non-negligible parasitic capacitance of the through-silicon-vias thatvertically interconnect the tiers [74], [75]. This work alongside related researchefforts in 3-D ICs are presented in Chapter 5.

2.4 Conclusions

Vertical integration can be accomplished by the usage of interposer technologies,3-D technologies, or both. 2.5-dimensional integration, realized by (passive)interposer technology, is an advanced packaging technology supporting highdensity integration. This technology, based on either silicon or glass, adds a newlayer to the packaging hierarchy interconnecting the package substrate and thehost dice into a system, offering shorter off-chip wires. The most wire dominantcomponent of an interposer is the redistribution layer. However, not significanteffort has been placed yet to study the effect of different interposer materials suchas silicon and glass, to these planar wires.

Circuit stacking leads to three-dimensional integrated circuits. Dependingupon the manufacturing process, 3-D integration can result in monolithic orpolylithic systems. In a monolithic 3-D IC, the devices are grown on a singlewafer. Alternatively, in a polylithic structure, circuits are prepared separatelyand communication between tiers is ensured either contactlessly or with through-silicon-vias. Three-dimensional integration with TSV and microbumps supportshigh density circuits due the short and thin vertical connections between tiers. Inaddition, various TSV technologies (VF, VM, and VL) and bonding styles (F2B,F2F, B2B) are available providing high integration density.


The main enhancement of TSV based integration is the considerable wirelengthreduction. This reduction of interconnect parasitics RC has been shown to improvethe performance and power of a variety of digital circuits, such as processors andmemories. However, EDA tools and design flows are required to support TSV basedintegration so as to fully exploit the potential of 3-D integration. Furthermore, theincreased power densities in 3-D ICs can result in unacceptably high temperatures.Therefore, reducing the power is imperative for 3-D circuits, as power and heatare interwind. However, power savings due to the wirelength reduction in 3-D ICscan be limited from the non-negligible capacitance of TSVs. Therefore, new waysshould be explored to reduce power in 3-D ICS while considering the innate traitsof 3-D integration.

Various techniques are available to decrease power in modern circuits, suchas gate level power optimization, clock and power gating, multi-threshold cells,and voltage scaling. Voltage scaling is one of the most efficient techniques fordecreasing power due to the quadratic dependency of dynamic power on thesupply voltage. In addition, power and performance are considered in tandem inthis technique. However, this technique is not yet fully explored for 3-D circuits.Furthermore, voltage islands are an extension of voltage scaling combined withpower gating in order to reduce power. The primary difficulty in exploiting thesetechniques in a system (either 2-D or 3-D) is the requirement of additional circuitryat the interfaces between blocks which operate at different voltages. Therefore,there is an imperative need to provide a circuit interface in order to by-pass thisadditional circuitry under specific operating conditions in order to minimize timingpenalties.

Chapter 3

Interconnect Design and Analysisfor Interposer Technologies

Interposer technologies have emerged as the demand for interconnection densityhas significantly increased in high performance systems. This demand towardshigher interconnect density on-chip as well as off-chip makes interconnect analysisand design a crucial aspect of the circuit design process greatly affecting systemperformance. Interposer based systems, such as the Xilinx Virtex 7 and AMD FijiGPU, interconnect multiple dice on a silicon interposer, offering significant powerreduction and increase in bandwidth [18], [19]. These improvements originate fromthe high density horizontal wiring in the redistribution layer of the interposer.

The power efficiency of interconnects in the RDL of a silicon interposer wasinvestigated in [25]. However, other important design objectives are not consideredin [25]. In addition, miniaturization of packages and systems necessitates the useof high density interposers or packages made of potentially different materials [93].An alternative to silicon is the glass interposer which supports similar via diameter,via pitch, and high density wiring. However, the impact of different interposermaterials, such as silicon and glass, on the horizontal wires of the RDL has notbeen investigated.

In this chapter, glass interposers are also considered as an alternative to siliconinterposers. Design objectives other than power efficiency including crosstalknoise, area, and power-delay product are explored to better ascertain the meritsrelating to the two types of interposers. A comparison between interconnectsfor data transmission on glass and silicon interposers is presented. Traditionaldesign parameters, such as spacing between wires and width of interconnects are

53

54 CHAPTER 3. INTERCON. DES. AND ANALYS. FOR INTERPOSERS

determined for diverse objectives considering the different electrical propertiesof both materials. These design parameters are shown to considerably changebetween the two materials aiming the same design objective and/or metric.

This chapter is organized as follows. In Section 3.1, the technological charac-teristics and models of both interposer materials are described. In Section 3.2,the physical behavior and performance of interconnects on glass and silicon inter-posers are investigated and possible tradeoffs are addressed. Design guidelines forinterconnects on both materials targeting high density interconnections are offeredin Section 3.3 aiming the reduction of crosstalk, delay, and power. Conclusionsare offered in Section 3.4.

3.1 Modelling Wires on Silicon and Glass Inter-posers

In this section, the traits of interposer interconnects are described. Without lossof generality, the discussion below will be based on a descriptive example of asystem employing an interposer technology, such as a processor and memory oninterposer. A DDR2 SDRAM is assumed as the memory module, which requires abandwidth of 400 MT/s [25], [94]. Moreover, a dual-core Blackfin processor [95] isassumed for the CPU module. This processor supports connection to the DDR2interface of the memory module [95]. This motivating example of a system withinterposer is illustrated in Fig. 3.1.

Interposer

CPU MemoryTrace in RDL

Passivationlayer

Figure 3.1: A motivating example of s system with a CPU and a Memory withinterposer traces.

Glass and silicon are explored as the two candidate interposer materials. Inboth cases, interconnections between the hosted dice (CPU and memory) aresupported by the redistribution layer (RDL) on top of the interposer substrate (see

3.1. MODELLING WIRES ON SILICON AND GLASS INTERPOSERS 55

Fig. 3.1). Dimensions of the RDL wires are based on the selected wiring technology.For our descriptive example of a CPU and a memory on interposers, a 65 nmBEOL technology is employed as the RDL, same as the state-of-the-art wiring oninterposers from [18] and [25]. The dimensions of interconnects according to thePredictive Technology Model (PTM) [96] for the chosen process node are listed inTable 3.1. Typically, the global interconnects of a BEOL technology are utilizedon passive interposers as the local and intermediate wires are short interconnectsutilized to interconnect logic gates [17], [18]. Therefore, the top two global metal(copper - Cu) layers with minimum pitch1 of 0.9 µm, wire thickness 1.2 µm anddielectric height between metal layers of 0.2 µm are used to interconnect the CPUand memory module. In addition, polyimide with er = 3.4 [29], [97] is assumed asthe dielectric in the passivation layer for both silicon and glass interposers.

Table 3.1: Interconnect physical characteristics for the 65 nm PTM processnode [96].

Layer Width Space Thickness Height[nm] [nm] [nm] [nm]

Local 100 100 200 200Intermediate 140 140 350 200

Global 450 450 1200 200

Memory and CPU are assumed to be connected with 10 mm long interconnectson the interposer, considering the I/O pin locations of these modules and typicalwire lengths of other interposer based implementations [18], [19]. In addition,the RDL traces on the interposer are modeled as distributed interconnects with100 π-type RLC segments. Self and mutual inductances between wires are alsoconsidered, yet at 400 MT/s the interconnects do not exhibit inductive behavior.Consequently, the performance of the interconnects is primarily determined bythe RC characteristics of the wires.

Additionally to the wire parasitics electrostatic discharge (ESD) capacitorsare also included. Microbumps, modeled as RLC elements, are included for eachdriver and receiver [25]. On-die termination (ODT) is not used. IBIS2 models for1Pitch = W + S, where W is the width of the wire and S is the space between two adjacentwires (see Fig 3.3).

2Input/output Buffer Information Specification (IBIS) models are generally used to performvarious board level signal integrity (SI) simulations and timing analyses for the I/O pins of achip without revealing confidential information for the circuits.


the I/O drivers of the CPU [95] and DDR2 SDRAM memory at 400 MT/s [94] areutilized. Clock alignment and data recovery are typically performed from circuitryinside the CPU and memory modules before the I/O pins. Hence, the dissipatedpower is based on the energy supplied by the CPU I/O drivers, consideringsignals propagating from the CPU to memory. In Fig. 3.2, the electrical model ofthree adjacent interconnect wires, connecting the previously mentioned CPU andmemory circuits, is illustrated.

Closed-form expressions from [98] are utilized to describe the interconnectcapacitance. In Fig. 3.3, the various components of capacitance are shown for bothmaterials. Due to the low permittivity of the glass substrate (er glass = 3.4, [29]),the electric field lines terminate to the neighboring wires in metal layer 7 andthe metal layer above. This situation leads to strong coupling capacitance to theadjacent interconnects. Capacitance expressions for parallel lines with one groundplane are employed for interconnects on glass interposer (see left hand side ofFig. 3.3). Alternatively, the silicon substrate behaves as a ground plane due tothe high permittivity of er sil = 11.9 [29]. Hence, closed-form expressions for thecapacitance of parallel lines between two ground planes are used for this structure(see right hand side of Fig. 3.3). Closed-form expressions for the resistance andinductance are employed from [96].

3.2 Interconnect Analysis on Interposers

The baseline structure for identifying the design tradeoffs is presented in thissection. The behavior of glass and silicon interposers is investigated for a 65 nmBEOL technology. The minimum pitch for this process node is 0.9 µm [96], asmentioned in the previous section. For high density systems smaller pitches canbe required [22], [30].

The interconnects depicted in Fig. 3.2 are simulated with HSPICE [99] fordifferent switching scenarios. For measuring the power dissipation and signal delayof the middle interconnect in Fig. 3.2, wire B switches while A, C are quiet atthe ground level (nominal case). In addition, wire B is silent while A, C switchtogether for capturing the voltage noise at the far-end of wire B. In Table 3.2, theRLC characteristics of the interconnects on glass and silicon interposers are listedfor minimum pitch.

3.2. INTERCONNECT ANALYSIS ON INTERPOSERS 57

V_p

uls

e 4

00

MT/s

R_p

re d

river

10

0 Ω

CPU

DC

1

.8 V

DC

1

.8 V

DD

R2

mem

ory

Rout 1

MΩ

+-

R_p

re d

river

10

0 Ω

CPU

DC

1

.8 V

DC

1

.8 V

DD

R2

mem

ory

Rout 1

MΩ

R_p

re d

river

10

0 Ω

CPU

DC

1

.8 V

DC

1

.8 V

DD

R2

mem

ory

Rpkg

Rout 1

MΩ

BA C

RLC

Cco

uplin

g

Mutu

al

ind

uct

ance

Rm

icro

bum

p 0

.09

5 ΩL m

icro

bum

p0

.05

3 n

H

CESD

5

0 f

FC

mic

robum

p

5.4

fF

L mic

robum

p0

.05

3 n

H

Cm

icro

bum

p

5.4

fF

CESD

5

0 f

F

Rm

icro

bum

p 0

.09

5 Ω

RLC

RLC

Rpkg

Rpkg

Rm

icro

bum

p 0

.09

5 ΩL m

icro

bum

p0

.05

3 n

H

CESD

5

0 f

FC

mic

robum

p

5.4

fF

L mic

robum

p0

.05

3 n

H

Cm

icro

bum

p

5.4

fF

CESD

5

0 f

F

Rm

icro

bum

p 0

.09

5 Ω

Rpkg

Rpkg

Rm

icro

bum

p 0

.09

5 ΩL m

icro

bum

p0

.05

3 n

H

CESD

5

0 f

FC

mic

robum

p

5.4

fF

L mic

robum

p0

.05

3 n

H

Cm

icro

bum

p

5.4

fF

CESD

5

0 f

F

Rm

icro

bum

p 0

.09

5 Ω

Rpkg

V_p

uls

e 4

00

MT/s

V_p

uls

e 4

00

MT/s

+- +-+-+-+-

Cco

uplin

g

Figu

re3.

2:El

ectr

icalm

odel

ofth

ree

wire

son

inte

rpos

ers,

inte

rcon

nect

ing

ade

scrip

tive

exam

ple

ofan

inte

rpos

ersy

stem

with

aC

PUan

da

mem

ory.


Glass

M8

M7Ccoupling

CgndH

S W

T

er_glass = 3.4

CgndH

S W

T

Silicon

Cgnd

er_sil = 11.9

H

Ccoupling Ccoupling Ccoupling

Figure 3.3: Cross-section of two interconnect structures on glass and siliconinterposers, respectively.

Table 3.2: Electrical characteristics of interconnects on glass and silicon interposersfor minimum pitch.

Interconnect impedancecharacteristics Glass Silicon

R [Ω/mm] 40.7 40.7L [nH/mm] 1.98 1.98Cgnd [fF/mm] 121.5 222.7Ccoupling [fF/mm] 191.4 105Ctotal [fF/mm] 321.9 327.7

Results illustrated in Fig. 3.4, show that interconnects of the same length(e.g., 10 mm) on the glass interposer are slightly faster and more power efficientover silicon interposer at minimum pitch. Alternatively, interconnects on glassinterposers suffer from higher crosstalk, which can lead to signal integrity problems.Consequently, the same pitch or, alternatively, the same width and space (forfixed pitch) should not be used where different interposers are utilized.

The IBIS models used for the memory I/O receivers at the far-end of thewires recognize a logic zero for voltage values between 0 V - 0.65 V and alogic one between 1.125 V - 1.8 V [94]. Wires on glass interposers suffer fromtwice as high crosstalk as compared to silicon as depicted in Fig. 3.4. Thiseffect is more pronounced in interconnects on a glass interposer as the capacitivecoupling between adjacent wires is higher than on silicon. This strong capacitivecoupling of interconnects on glass interposers can potentially lead to incorrectsignal transmission. In Fig. 3.5, the peak amplitude noise at the far-end ofinterconnects on glass and silicon interposers, respectively, is illustrated for varying

3.2. INTERCONNECT ANALYSIS ON INTERPOSERS 59

Glass Silicon0

0.5

1

1.5

2

2.5P

ower

[mW

]

(a)Glass Silicon

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Del

ay [n

s]

(b)Glass Silicon

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Noi

se [V

]

(c)

Figure 3.4: (a) Power, (b) delay, and (c) noise simulations of the interconnectstructure shown in Fig. 3.2 for minimum interconnect pitch, wirelength of 10 mm,and frequency of 200 MHz.

pitch. Crosstalk noise is mitigated by increasing the space between wires [100].According to Fig. 3.5, crosstalk of interconnects on glass interposers decreases by50% for spacing larger than 1.2 µm as compared to the upper voltage limit oflogic zero (Vmax zero). Alternatively, noise on wires on silicon interposers is ratherinsignificant (<0.55×Vmax zero) for minimum spacing. Therefore, this behaviorof interconnects on glass interposers leads to a design tradeoff between area andnoise, in particular, for those systems that require high density interconnections.

Increasing the interconnect pitch also changes the loop inductance of theinvestigated structure. The overall effect, however, is not significant due to the lowspeed of DDR2 [94], [101]. For interconnects with faster switching speeds, noiseeffects from mutual inductance can also contribute considerably to the overallnoise [100], further increasing the importance of this area and noise tradeoff.

Propagation delay and power generally form a design tradeoff and power-delayproduct (PDP) is an efficient way to describe this tradeoff. The power-delayproduct of interconnects on glass interposers is significantly lower than siliconas illustrated in Fig. 3.6. The traits of the interconnects on silicon interposersmake PDP to increase twice as fast as compared to glass. As the silicon substrateis modeled as a second ground plane (see Fig. 3.3), interconnects on siliconinterposers have twice as high ground capacitance than on glass interposer. Thus,glass interposers are a better option for systems that target low power and delay


0

1

2

3

0

1

2

30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Space [µm]Width [µm]

Peak n

ois

e a

mplit

ude [V

olt]

Silicon

Glass

Figure 3.5: Far-end voltage noise for interconnects on silicon and glass interposerswith wirelength of 10 mm and frequency of 200 MHz.

assuming a relaxed pitch or, equivalently, relaxed area constraints to avoid crosstalkproblems.

Alternatively, for small pitches, which support high density interconnects,especially where spacing between interconnects is less than 1.50 µm, the PDP forglass increases. This increase is due to the increase in coupling capacitance reducingthe PDP difference between the two types of the interposers as depicted in Fig. 3.6for W and S smaller than 1 µm. Therefore, a design tradeoff between PDP andarea is formed for high density interconnects on glass interposers. As illustratedin Fig. 3.5, predominantly for spacing larger than 1.50 µm, crosstalk noise dropsconsiderably for interconnects on glass interposers. Hence, for relaxed pitch,ground capacitance is the primary capacitance component. As this component islower for glass than silicon, the PDP for glass interposers does not increase withwire width as fast as in silicon interposers.

Based on these observations, interconnects on glass and silicon interposersbehave differently for disparate objectives, such as crosstalk and PDP. Moreover,

3.3. DESIGN GUIDELINES 61

0

1

2

3

01

23

0.5

1

1.5

2

2.5

Space [µm]Width [µm]

PD

P [

pJ]

Glass

Silicon

Figure 3.6: Power-delay product for interconnects on silicon and glass interposersfor different pitches with wirelength of 10 mm, and frequency of 200 MHz.

this behavior varies for different area constraints. Therefore, determining theappropriate design parameters, such as spacing and width for high density designsis an important task as the resulting performance can considerably vary betweenthe different interposers.

3.3 Design Guidelines

In this section, design guidelines are proposed for interconnects on glass and siliconinterposers. Different area constraints, such as large pitches and high densityinterconnects, are considered.

As mentioned in the previous section, for large spacing (>1.50 µm), thecoupling capacitance of adjacent interconnects on a glass interposer is greatlyreduced. Hence, ground capacitance becomes dominant. Therefore, interconnectwidth is the most important design parameter for both materials in low densityinterposers (large pitches). For the investigated 65 nm process node, this guideline


applies for pitches larger than 2 µm. Wires on a silicon interposer exhibit twiceas high ground capacitance as compared to glass. Consequently, increasing thewire width on a silicon interposer leads to considerably higher PDP than in glassinterposer as depicted in Fig. 3.7. Thus, wider interconnects can be employed onglass interposers as compared to silicon. Consequently, glass interposers are asuperior choice for low power and delay constrained designs where area constraintsare relaxed.

1 1.2 1.4 1.6 1.8 20.4

0.6

0.8

1

1.2

1.4

1.6

1.8

PD

P [p

J]

GlassSilicon

47.3%

23.2%

Figure 3.7: Power-delay product of interconnects in low density interposers withincreasing width and fixed spacing at 1.50 µm at wirelength of 10 mm andfrequency of 200 MHz.

Alternatively, high density designs with strict area constraints require differentdesign guidelines due to the different behavior of interconnects. For systemswith stringent power and area budgets, minimum interconnect width has to beapplied to both materials so that the capacitance to the ground is minimum. Forfixed pitch interconnect structures, this situation is listed in Table 3.3, wherethe minimum width (W = 0.45 µm) leads to the smallest power consumption forboth structures. However, for small pitches, coupling capacitance increases. Thissituation is more severe for glass. Therefore, for fixed pitch, minimum interconnectwidth and maximum space should be selected, particularly for glass interposers.The results shown in Fig. 3.8 demonstrate this situation. Decreasing the width of


wires is the primary means to reduce the power of interconnects in both interposers(see yellow lines in Figs. 3.8(a) and 3.8(b)). However, for interconnects on glassinterposers increasing the space leads to non-negligible power savings due to thestrong coupling capacitance (see purple line in Fig. 3.8(b)).

Table 3.3: Power consumption ([mW]) of interconnects on glass and siliconinterposers for fixed pitch at 1.95 µm with wirelength of 10 mm and frequency of200 MHz.

Space [µm]/ Width [µm] Glass Silicon1.5/0.45 1.883 2.0641.35/0.60 1.967 2.2481.20/0.75 2.099 2.4321.05/0.90 2.215 2.5950.9/1.05 2.329 2.7860.75/1.20 2.510 2.9840.60/1.35 2.674 3.1780.45/1.5 2.896 3.467

Increasing the width of wires is a usual means to reduce delay. However, forinterconnects on glass interposers, the effect of crosstalk on delay is not negligible,in particular, for the worst case scenario where A and C wires switch oppositeto B (see Fig. 3.2). In this scenario, the interconnect delay measured for thedesign parameters which result into minimum delay for the nominal case increasesby 26% for the glass and 7% for the silicon substrate, respectively. As listedin Table 3.4, different width and space have to be employed for interconnectson glass and silicon interposers such that the delay is minimum. The reductionof delay for interconnects on glass interposers from minimum pitch (S = W =0.45 µm) to pitch equal to 1.95 µm, where S = 1.5 µm, W = 0.45 µm and S =0.45 µm, W = 1.5 µm is 30% and 44%, respectively. Whereas, for wires onsilicon interposers the decrease in delay is 18% and 48%, respectively. Thus,simply increasing the width for a fixed pitch does not necessarily improve delay.Hence, increasing wire spacing on glass interposers is a more efficient way toreduce delay so that the contribution of the coupling capacitance to the delaydecreases. Alternatively, reducing interconnect delay on silicon interposers isprimarily achieved by increasing the wire width, which reduces the interconnectresistance.


0.8 1 1.2 1.4 1.6 1.8 21.5

2

2.5

3

3.5P

ow

er

[mW

]

Increasing SpaceDecreasing Width

52%

9%

(a) Silicon interposer.

0.8 1 1.2 1.4 1.6 1.8 21.5

2

2.5

3

3.5

Po

we

r [m

W]

Increasing SpaceDecreasing Width

33%

16%

(b) Glass interposer.

Figure 3.8: Comparison of power reduction by increasing the space or decreasingthe width of wires on different interposer materials for high interconnect densities(pitch < 2 µm) with wirelength of 10 mm and frequency of 200 MHz.


Table 3.4: Interconnect delay ([ns]) on glass and silicon interposers for fixed pitchat 1.95 µm with wirelength of 10 mm and frequency of 200 MHz.


For the power-delay product the previous guidelines have to be combined. Aslisted in Table 3.5, the minimum PDP for interconnects on both glass and siliconinterposers results for the same design parameters3 (S = 1.2 µm and W = 0.75µm). This result emphasizes the different behavior of interconnects on glass andsilicon interposers. The PDP for interconnects on silicon interposers is minimumfor very small widths e.g., W = 0.75 µm, due to the low ground capacitance whichis the primary capacitive component. Conversely, large spacing S = 1.2 µm, resultsin minimum PDP for glass interposers due to the reduction of capacitive couplingbetween adjacent interconnects. Furthermore, the difference of the minimum PDPbetween the two materials in Table 3.5, decreases from 35% to 9% for the worstcase switching scenario, mainly due to higher effect of coupling capacitance oninterconnects on glass interposers than on silicon interposers.

Finally, interconnects on glass interposers are more prone to crosstalk noisethan on silicon. Therefore, for voltage noise specifications around 0.25 V andfixed pitch (1.95 µm), interconnects on glass interposers have to be placed fartherapart (S/W > 3) in order to reduce the coupling capacitance. Alternatively, forinterconnects on silicon, any selection of width and spacing complies with thisnoise constraint. This situation is depicted with blue lines in Fig. 3.9.

However, as interconnects on glass interposers require large spacing (S/W > 3)to ensure this noise constraint is satisfied, a non-negligible increase in PDP by16% is observed as compared to the minimum PDP at S/W = 1.6 (see black linesin Fig. 3.9). Alternatively, for interconnects on silicon interposers, the design

3Slightly different values can result for a finer sizing step.


Table 3.5: Power-delay product ([pJ]) of wires on glass and silicon interposers forfixed pitch at 1.95 µm with wirelength of 10 mm and frequency of 200 MHz.


0 0.5 1 1.5 2 2.5 3 3.5

Space/Width

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Cro

ssta

lk N

ois

e [V

]

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

PD

P [p

J]

Noise Silicon

Noise Glass

PDP Silicon

PDP Glass

MinimumPDP

Allowed Noiseof 0.25 V

Figure 3.9: Crosstalk and PDP simulations of interconnects on silicon and glassinterposers for fixed pitch at 1.95 µm with wirelength of 10 mm and frequency of200 MHz.

parameters (width and spacing) resulting to minimum PDP (S/W = 1.6) areallowed as crosstalk noise is not a limiting factor. Thus, the difference of PDPbetween silicon and glass decreases by 10% while considering crosstalk noise.In spite of this decrease, however, the PDP for glass remains lower than on

3.4. CONCLUSIONS 67

silicon. Small pitches are applicable for glass interposers, as long as the crosstalkconstraints are not over-restrictive (i.e., Vmax noise < 0.14×Vdd).

3.4 Conclusions

Interposer technologies offer high density, high performance interconnects for inte-grated systems resulting in smaller form factors and improved system performanceas compared to traditional packages. In this chapter, an extended version of mypublished work [12] is presented. This work offers insight on the different designtradeoffs which result from the usage of silicon and glass interposers due to thedifferent material characteristics. Emphasis is placed on the redistribution layers(RDLs) of the interposer rather than the vertical vias due to the long length ofthese wires.

Design guidelines for interconnects on glass and silicon interposers that satisfyarea, power, delay, and crosstalk constraints are determined. Emphasis is placedon BEOL at 65 nm technology node which is the state-of-the-art wiring forinterposers [18], [25]. Interconnects on glass interposers are a superior alternativeto silicon interposers in terms of power and latency. On the contrary, interconnectson glass are more prone to crosstalk effects than on silicon for small spacing. Thissituation requires a different treatment for sizing the interconnects as there isa tradeoff between area and noise for the glass interposers. Furthermore, theminimum pitch does not result to minimum power, delay, and crosstalk. Finally,increasing the wire width on a silicon interposer leads to higher power consumptionthan on glass for the same width. Consequently, glass interposers are a bettersolution for low-power systems under the same latency constraints.

Chapter 4

Design Flow for TSV Based 3-DCircuits

Interposer technology offers short off-chip interconnect lengths in order to improvecommunication bandwidth and power consumption as compared to older packagingtechnologies. As demonstrated in the previous chapter, emphasis on improvinginterposer technologies should be placed on investigating the interconnects ratherthan the gates of the design, as passive interposers do not utilize active circuitry,such as gates. Alternatively, three-dimensional integration with TSVs offers shorteron-chip interconnect lengths [2], [102]. This situation requires significant effort tobe placed on both the on-chip wires and the gates of a circuit in order to design3-D circuits.

However, considering the complexity of these 3-D structures, design flows arerequired to fully exploit the advantages of vertical integration. Furthermore, thediverse TSV and bonding technologies make the design exploration of 3-D circuits,in terms of power and performance, a primary requirement. Therefore, in thischapter, a novel design automation flow, compatible with static timing analysis(STA), for exploring the timing and power of TSV based 3-D ICs is presented.The design experience of using this flow is similar to a 2-D flow, as commercial2-D EDA tools along with a publicly available academic 3-D tool are utilized. Inaddition, various TSV and bonding technologies are supported in order to capturethe sheer gains of 3-D integration.

This chapter is organized as follows. A brief discussion of circuit designflows and related work on design methodologies and flows tailored to 3-D ICs isoffered in Section 4.1. In Section 4.2, my proposed design flow for TSV based

69

70 CHAPTER 4. DESIGN FLOW FOR TSV BASED 3-D CIRCUITS

3-D circuits [13] is presented. Design space exploration of timing and powerby applying the proposed flow on several benchmark circuits is presented inSection 4.3. Conclusions are offered in Section 4.4.

4.1 EDA Tools and Flows for Designing Circuits

Advanced EDA tools have been developed to simplify and speedup the designprocess of a circuit [103]. These tools are typically incorporated into a design flowconsisting of several frontend and backend steps, enabling the design explorationof a system at different levels of abstraction. At the frontend of the design flow,emphasis is placed on the architecture of the system and functional verificationthrough simulations. At the backend of the flow, the physical characteristics ofthe circuit components such as the gates and interconnects, are considered. InFig. 4.1, the typical steps of the backend of a design automation flow to accuratelydetermine the power, performance, and area of 2-D circuits are illustrated.

RTL.v

Synthesis

Place&Route

2-D_PnR.v

2-D_Syn.v

SPEF

Formality

STA+PowerAnalysis

2-D_Frm.v

Floorplanning

2-D.def

Figure 4.1: The typical backend of a flow for designing 2-D circuits. Light greyrectangles depict the steps of the flow, whereas the primary intermediate filesproduced by each of these steps are illustrated by white rectangles.

In the first step of the design flow, the behavioral description of a design (inVerilog or VHDL) is synthesized to a netlist of logic gates. During the flooplan

4.1. EDA TOOLS AND FLOWS FOR DESIGNING CIRCUITS 71

step, the logic gates described by the synthesized netlist are arranged (within afixed or non-fixed outline) and then placed and routed (PnR) in the followingstep. Note that the steps of floorplanning and PnR can be simultaneously handledby advanced EDA tools such as Cadence Encounter [104]. The main outputs ofthe PnR stage are the placed netlist of the circuit and the standard-parasitic-exchange-format (SPEF) file which contains information relating to the impedanceof interconnects. A formal equivalence check is performed to ensure that afterPnR, the resulting netlist is the same as the initial synthesized netlist. Finally,static timing analysis (STA) and power analysis are performed for the wholecircuit while considering the delay and power of both the gates and interconnects.

Three-dimensional integration with TSV primarily affects the physical charac-teristics of a circuit. Therefore, for the past few years, several methods and toolsthat address specific steps of the backend of design flows have been developed for3-D ICs. A summary of previous works on 3-D EDA tools is listed in Table 4.1,at the end of this section. W.-L. Hung et al. proposed a floorplanning techniqueto address the tradeoff between interconnect power consumption and thermaldissipation for 3-D microprocessors [105]. Moreover, M.-K. Hsu et al. proposedan analytical TSV aware placement method for 3-D ICs [106] while consideringthe overhead of TSV in the wirelength of the circuit. However, the proposedtechniques in both [105] and [106] are not compatible with standard-exhange-fileformats of standard 2-D EDA tools, thereby restricting the evaluation of theperformance and power of a 3-D circuit.

Cong and Luo from the EDA lab of UCLA implemented a floorplan andplacement tool (3D-Craft) for 3-D circuits that minimizes the wirelength [107].3D-Craft has as input a synthesized netlist of a circuit and outputs the design-exchange-format (DEF) files for each tier of the 3-D system. However, STA andpower analysis cannot be performed with the output of this tool as TSVs aremodeled as pseudo cells instead of wires without any timing and power information.In addition to these physical design techniques, plenty of TSV models have beenproposed ([48], [108]) due to the diverse fabrication processes and bonding stylesfor 3-D circuits. Integrating these methods and models in a design flow, however,is not a straightforward task and consequently limits the design exploration of3-D circuits.

Several 3-D prototype circuits have been explored in the past few years [109],[110], [111]. The design flow of these circuits combines existing 2-D tools and


plenty of design effort to properly adapt these tools and ensure that more thanone tier can be managed. However, these approaches were developed for specificcircuits and a specific TSV technology and bonding style. An early effort for a 3-Ddesign flow was developed by NCSU, providing a process design kit for multi-tiercircuits based on Cadence tools [112]. However, important steps of this flow, suchas 3-D placement are not supported. Recently, Lim et al. developed a completeflow for monolithic 3-D circuits [113]. Since monolithic 3-D circuits do not useTSVs, adjusting commercial 2-D tools is sufficient for developing a design flowfor this type of 3-D circuits. This situation is due to the monolithic inter-layervias, which can be treated as common metal contacts. Alternatively, TSV based3-D circuits cannot be managed directly by standard 2-D tools. In addition, toenable design exploration of 3-D ICs with TSVs, any design flow should supportthe integration of different TSV technologies and corresponding electrical models.In the following section, my proposed design flow for TSV based 3-D integrationwhich addresses these issues is presented.

Table 4.1: A summary of previous works on 3-D EDA tools.

Work Emphasis Remarks/Comments

[105] Floorplanning• Optimizing interconnect power and

thermal dissipation• Not compatible with 2-D EDA tools

[106] Placement • TSV aware placement• Not compatible with 2-D EDA tools

[107] PartitioningFloorplanning

• TSV aware Part. & Fp• LEF/DEF compatible• Not compatible for STA and

power analysis[48], [108] TSV RC • Useful for Spice simulations

[112] Monolithic3-D flow

• Fully compatible with 2-D EDA tools• Not compatible for TSV-based 3-D ICs

[109], [110], [111] PrototypeCircuits

• Demonstration that 3-D is applicable• Circuit specific flows

4.2. STA-COMPATIBLE BACKEND DESIGN FLOW 73

4.2 STA-Compatible Backend Design Flow

The proposed 3-D flow is depicted in Fig. 4.2, where the common steps with a2-D flow are shown by light grey rectangles. The new stages are added as darkgrey rectangles. The flow initially synthesizes the HDL description of a circuitto a network of logic gates. This step is the same as in a general 2-D flow andadvanced EDA tools such as Synopsys Design Compiler [114] can be utilized.

Eacher:3-D_Syn.v,3-D.def

Place&Route

3-D_PnR.v

Formality

3-D_Frm.v

RTL.v

Synthesis

Syn.v

STA+PowerAnalysis

3-D_powertracefile

BondingStyle+TSVsize

3-DParoner/Floorplanner

MergeSPEF

UpdateCDN+Cells

3-DThermalAnalysis

Figure 4.2: Proposed backend of a design flow for TSV based three-dimensionalintegrated circuits.

Bonding Styles and TSV sizes

A system level decision is required about the number of tiers, the bondingstyle of tiers, and the TSV size. These parameters are being selected by the


designers rather than the flow itself. This flow enables designers to explore theseparameters and then select the appropriate 3-D integration for their specific designtargets. In order to proceed to the following steps, appropriate models of TSV arerequired. Compact TSV models consist of the physical layout-exchange-format(LEF) files of TSV and the related electrical (RC ) characteristics. The 3D-Crafttool along with the technology library freePDK45 3-D provide templates to createthe TSV LEF files for the target TSV technology [107], [112]. There is no lim-itation on the utilized TSV model and any model that describes the electricalcharacteristic of the TSV (e.g. R, C, and/or L) can be integrated to this flow.TSV models with parameterized physical characteristics, for example, the TSVdiameter and length can also be employed using a look-up table. In this work, theITRS roadmap of vertical interconnects is used to produce a library of TSV tech-nologies [49] supported by the flow. Moreover, closed-form expressions from [48]are utilized to determine the impedance characteristics of TSV. In Table 4.2, thephysical and electrical characteristics of a variety of TSV technologies are listed.

Table 4.2: Physical and electrical characteristics of different TSV technologies.

TSV Diameter Length Capacitance Resistancetechnology [µm] [µm] [fF ] [mΩ]Via-Last 50 300 780 0.003Via-Last 30 250 514 7.3Via-Last 20 200 339 13.8Via-Last 15 150 205 19.5

Via-Middle 10 100 97 33.4Via-Middle 5 50 37 66.8Via-Middle 3 30 16 111.4Via-First 1 10 3 334.2

3-D Partitioner/Floorplanner

With the synthesized netlist and the selected 3-D integration style (number oftiers, bonding style, and TSV size), the next step is to perform 3-D partitioningand floorplanning of the circuit. For these steps, the 3D-Craft [107] tool has beenselected as it is the only open-source tool available to date for partitioning thecircuit in three dimensions that outputs the location of TSVs (in DEF files) andthe RTL description of the circuit for each tier. Therefore, this tool is compatible


with commercial IC design tools and, hence, can be integrated to a completedesign flow. Note that this proposed flow is not limited to 3D-Craft as the parti-tioning/floorplanning tool. Other 3-D tools for these steps can be employed aslong as they output the DEF and HDL (.v) files, for each tier.

The main objective of this tool (3-D Craft) is the reduction of wirelength usinga weighted TSV parameter. For this reason, for a selected TSV weight by theuser, it performs partitioning and floorplanning of the circuit into different tiers.The algorithms of the 3-D craft tool are based on the half perimeter wirelengthmodel (HPWL) while considering the overhead in area from TSVs. As this toolruns once for a selected TSV weight, we took advantage of this situation andwe used this tool to sweep the TSV weight and obtain the area, wirelength, andimbalance factor1 of the 3-D circuit. In order to provide a realistic analysisof three-dimensional integration, partitions with imbalance factor between tiersgreater than 10% are excluded from our analysis. Afterwards, we selected thepartition with the maximum wirelength reduction as compared to the 2-D circuit.Typically, this objective is achieved at the min-cut partitioning (given an allowedimbalance factor), as the overhead in area due to the TSVs is minimized. Inaddition, considering the fragmentation induced to the circuit of the partitioningtool, one way to monitor this issue, is to monitor the outputs of imbalance factorand number of TSVs between tiers. As, our netlists are flatten (not hierarchical),the minimum fragmentation of the circuit in terms of connections between tiersis expected at the min-cut partitioning (where the smaller number of TSVs isselected). In addition, in terms of area/resources fragmentation, the imbalancefactor between tiers is the important parameter that needs to be monitored.Indeed, both of these parameters are monitored in this flow, and our objective isto select a partition with minimum number of TSVs, while the imbalance factoris constrained to an upper bound value. However, as fragmentation issues mayvary between different circuits, more analysis should be performed on that aspect.Therefore, this is considered as a future work topic.

Place and Route

The following steps are the placement and routing of each tier. These stepsare similar to a conventional 2-D flow with some additional information such1Imbalance factor is defined as the ratio of the number (or area) of cells between each tier.


as pre-defined pins and pseudo-cells for TSVs. Cadence Encounter [104] can beutilized for placing and routing the circuits in each tier. The location of TSVs andI/Os are imported (as DEF files) from the previous step (partitioning/floorplan-ing) alongiside the splitted flatten netlists (.v files) of each tiers. Then the PnRtool places and routes the designs while the main objective is to minimize thewirelength. Alignment of access points2 between each tier is verified in this stepby cross-checking the location of the pins and TSVs. Furthermore, informationfor the connections between tiers, such as the naming convention of pins and netsspanning across multiple tiers is grouped to be utilized in the following steps.Moreover, the utilization of a commercial EDA tool at the PnR stage, allows thedesigner to monitor the TSV density and routing congestion through a GUI andrelated commands supported by this tool. This situation enables the designerto alter the routing in specific cases for the 3-D circuit, in order to mitigate thecongestion issues induced by the usage of TSVs. A related discussion on TSVrouting congestion can be found at [45].

Clock Distribution Network

The subsequent steps merge the clock distribution network (CDN) and thecorresponding cells (i.e. clock buffers) into the placed and routed tiers (indicatedby the dashed dark grey rectangle in Fig. 4.2). The current flow does not synthe-size a clock network. Thereby, the CDN from a 2-D design is imported with thecorresponding CDN cells to each tier. To generalize the use of this flow, academicCDN synthesizers for TSV based 3-D ICs [115] with compatible standard fileformat can be utilized.

Merging SPEF files and Formality

A crucial step of the flow is the merging of the SPEF files of each tier into aglobal SPEF file and performing a design equivalence check through the formalitytool [116] across tiers. This task requires the resistance and capacitance of theTSV and naming of pins of each tier collected from the previous steps of the flow.The key idea of merging the SPEF files is that a net can either be routed withinone tier or can span multiple tiers (a 3-D net). In the first case, the impedance of2Access point of each tier is defined as the point where electrical connectivity with the adjacenttier is established for propagating signals or power/ground.


these nets is provided by the output of the PnR tool. For the 3-D nets, whichtraverse multiple tiers, the RC sections of a 3-D net within each tier need to beunified to one RC tree, modeling the electrical characteristics of the entire net.This linkage of the RC sections is implemented by removing the pseudo-TSVcells and adding the RC characteristics of the interconnects for the chosen TSVtechnology. The resulting RC network for an example net is depicted in Fig. 4.3.

Driver

Output 1

Output 2

(a) Net in a 2-D circuit.

Driver

Output 1

TSV (out) TSV (in) Output 2

(b) Segments of an inter-tier net within each tier after PnR.

Driver

Output 1

Output 2Rmicrobump

Cmicrobump

RTSV

CTSV/2CTSV/2

Inter-tier paracitics

(c) The complete inter-tier net resulting from merging the net segmentsin (b).

Figure 4.3: Merging of net segments from two tiers for a 3-D net.

The routing information for a portion of a 3-D net in each tier is contained inthe SPEF files and is used to create an undirected graph of the nodes of the netper tier. Starting from the tier where the driver is placed, a breadth-first searchalgorithm is employed to iteratively construct the RC tree of an inter-tier netuntil all the branches of this net within a tier are added to this tree. This processis repeated for each tier that the net spans, where the location of the TSV foreach branch of the net is considered as the root of the RC tree in a tier. Notethat multiple disconnected RC trees can be formed in each tier for a 3-D net, asthese nets may have more than one TSV connection to an adjacent tier. This


process terminates when all the sinks of a net are added to the tree and all ofthe TSV cells are replaced by the appropriate RC section. At the same time, anequivalence check is also performed with Formality ([116]) to ensure that afterthe several steps of the 3-D flow, the resulting netlist is the same as the initiallysynthesized network.

STA and Power Analysis

One of the main advantages of this flow is the support of accurate explo-ration of 3-D circuits in terms of power and performance. The flow enables theusage of commercial tools such as Primetime and PrimetimePX [117] for evaluatingthe speed and power of 3-D ICs. This situation is due to the merging of SPEFfiles of each tier of the 3-D stack to a global SPEF, where the parasitic impedanceof vertical connections is included. Other timing and power analysis tools canalso be used since the SPEF file format is compatible to many timing analysistools. The built-in functionalities of timing and power analysis tools along withthe SPEF file of the 3-D IC provide deeper insight as to how 3-D integrationaffects the performance and power of ICs. As full timing analysis is supported bythis flow for the entire 3-D stack, rather than the delay of the longest (inter-tier)path, setup and hold time violations can be observed and corrected. In addition,the paths can be grouped into specific categories depending on their functionality.This grouping enables the monitoring of timing bottlenecks in a circuit and howthese are affected by 3-D integration. Moreover, as the proposed flow supportsSTA for 3-D ICs, back-annotation circuit simulation with specific applicationtestbenches is also available. This situation enables both average and transientpower analysis of 3-D circuits with workloads from different applications.

Thermal Analysis

Exploring the thermal effects of 3-D circuits are beyond the scope of thiswork, as emphasis is placed on providing the accurate timing and power of 3-Dcircuits. Enabling the accurate power profiling of 3-D circuits can assist to analyzethe heat dissipation of 3-D circuits, as power and heat are related. However, someinsight of performing thermal analysis for three-dimensional circuits is providedin the following sentences.

4.3. RESULTS AND DISCUSSION 79

Considering the importance of heat dissipation in 3-D circuit, this flow canincorporate a step of Thermal Analysis. This feature is enabled by the proposedflow, as it is the first time that accurate power profiling can be performed forthe 3-D circuit at the stage of “STA and Power analysis”. At this stage, thepowertrace on application specific benchmarks is created and then it can be usedfor thermal analysis. One of the most advanced tools for thermal analysis is theManchester Thermal Analyzer (MTA) [118], as this tool can support 3-D circuits.The required input files to this tool are the powertrace file and the layout files(DEF) of each tier, where the location of cells and TSVs is included. Moreover,in order to capture the heat dissipation of TSVs, thermal analysis should beperformed at fine granularities (gate-level). This situation, however, can increasesignificantly the computational time. Therefore, in my co-authored work [119],we are providing the means to perform accurate gate level thermal analysis withlow computational cost.

4.3 Results and Discussion

In this section, the timing and power evaluation of several benchmark circuits ispresented using a typical 2-D flow and the proposed STA compatible 3-D flow.The results presented in this section are based on the benchmark circuits listed inTable 4.3.

Table 4.3: Benchmark circuits and the respective number of cells.

Circuit Technology # of CellsB04 [120] TSMC 45 nm G 317B19 [120] TSMC 45 nm G 66,117B20 [120] TSMC 45 nm G 12,110AVA [121] TSMC 45 nm G 12,275

LDPC [122] TSMC 65 nm LP 67,003

4.3.1 Physical Characteristics of 2-D and 3-D Designs

The area and wirelength of the benchmark circuits in a 2-D flow and the proposed3-D flow are listed in Table 4.4. In addition, the resulting number of TSVs bythe 3-D flow is reported in the last column of this table. A two tier face-to-back


implementation with Via-First TSVs is assumed for the 3-D version of the circuits.A two tier implementation was selected as a descriptive example of 3-D integrationwhile considering that the benchmarks circuits are relatively small to use morethat 2 tiers in three dimensions. A discussion on the number of tiers selected forthe 3-D stack based on wirelength prediction models can be found at [74]. Bothflows are run by using minimum optimization effort for each circuit to guaranteefairness. This choice is due to the lack of optimization objectives of the 3D-Craftother than wirelength whereas the PnR step performed in each tier by the 2-Dtools is supplemented by optimization techniques with several design objectives.In general, the total area of a circuit resulting from the 3-D flow can be largerthan a planar version due to the area of the TSVs as also pointed out in [123].

Table 4.4: Physical characteristics of benchmark circuits by utilizing a 2-D andthe proposed 3-D flow.

Circuit2-D flow 3-D flow

Area Wirelength Area Footprint Wirelength # TSVs[µm2] [µm] [µm2] [µm2] [µm]B04 1,038 3,432 1,315 607 4,386 123B19 155,304 919,586 166,502 83,251 795,328 745B20 22,508 130,464 23,552 11,776 105,465 545AVA 38,575 257,146 40,528 20,264 215,839 651

LDPC 469,593 6,329,037 479,770 239,886 4,645,759 4,384

For some benchmark circuits, the 3-D flow decreases the wirelength of thedesign. In others, the wirelength slightly increases due to the TSVs. For example,LDPC is a wire-dominant circuit, thus several long paths are substituted by TSVsresulting in significant wirelength reduction of 26.6%. Alternatively, B19 is acircuit composed of four modules without long wire connections, resulting in slightwirelength reduction by substituting the nets within a module with TSVs. Asthe timing and power information of all paths is reported by the flow, specificdesign parameters of the circuit can be adapted to meet the performance andpower requirements. Specific examples of design space exploration are presentedin the following sections.


4.3.2 Timing Analysis of 3-D Circuits

Most previous works of evaluating 3-D circuits report the longest inter-tier delay asa performance metric based on wirelength prediction models [124]. However, thisapproach offers limited insight of how TSVs affect the timing of 3-D ICs. By usingthe proposed flow, detailed timing information is obtained for all paths inside a3-D circuit. The supported clock period of the benchmark circuits implementedrespectively in one (2-D) and two tiers (3-D) is listed in Table 4.5. The circuitB19 exhibits an interesting timing behavior, where although the wirelegth of the3-D design is smaller than the 2-D design, the clock frequency is slightly lower.This result indicates that purely wirelength analysis is not sufficiently accuratefor timing analysis or performance optimization.

Table 4.5: Supported clock period by each of the benchmark circuits.

Circuit Tclk 2D Tclk 3D[ps] [ps]

B04 849 1,326B19 1,273 1,223B20 1,158 1,173AVA 1,541 1,329

LDPC 4,278 3,477

To further enhance the timing analysis of a 3-D circuit, considering only thepath with the longest delay in the circuit is not practical as it may represent a falsepath of the circuit. The proposed 3-D flow, due to the additional step of mergingthe SPEF files of each tier, enables the utilization of standard EDA tools, suchas PrimeTime, for evaluating the timing of the 3-D circuit. Thus, static timinganalysis is performed similar to a 2-D flow while utilizing the built-in functions ofPrimeTime so as to provide in-depth understanding of the effect of TSV on thedelay of internal paths. To facilitate the timing analysis, a simple grouping of thepaths among input to register (in2reg), register to register (reg2reg), and registerto output (reg2out) paths is set during the configuration of PrimeTime.

In Fig. 4.4, the breakdown of path delays for the benchmark circuits is depicted.The following observations can be made for each circuit.

• For B04, 3-D integration results in increased delay of reg2reg paths. Althoughin2reg and reg2out paths exhibit a negligible decrease in delay, these benefits


from 3-D integration are not useful as reg2reg dominates the delay of thedesign in both two and three dimensions.

• For B19 and B20, 3-D integration offers only marginal improvements. Recall-ing that the wirelength of the two circuits is still reduced in three dimensions,this observation suggests that the performance of these circuits is domi-nated by the logic gate delay and therefore the insertion of TSV degradesperformance.

• AVA and LDPC demonstrate an improvement in the delay of all groupsof paths in 3-D, which indicates that wire-dominant designs exhibit betterperformance in 3-D.

B04 B19 B20 AVA LDPC0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

De

lay [

ns]

in2reg 2-Din2reg 3-Dreg2reg 2-Dreg2reg 3-Dreg2out 2-Dreg2out 3-D

Figure 4.4: Breakdown of paths delay in in2reg, reg2reg, and reg2out for benchmarkcircuits with 2-D and 3-D integration.

To better understand the timing behavior of 3-D circuits, monitoring the delayof all inter-tier paths and evaluating the delay overhead of TSVs are supportedby the proposed 3-D flow. These features are enabled at the step of merging theSPEF files of each tier of the 3-D circuit, where all nets with TSVs (inter-tier nets)are stored into a list. This list is utilized by the STA tool to select and monitorthe delay of paths for those signals that propagate through these inter-tier nets. InFig. 4.5, the timing slack histogram of inter-tier paths for B04 circuit is depicted.For this circuit, the parasitic impedance (RC ) of TSV and the long wires routing


distant TSVs, increase the delay of the paths as compared to the planar circuit.Therefore, changing the bonding style, for example, face-to-face where no TSVsare employed, is an alternative solution to improve the delay for this circuit inthree dimensions. Moreover, the timing violations for specific paths are difficultto be determined by using earlier 3-D flows based on wirelength models, as onlythe delay of long paths is considered. For the LDPC circuit, the benefits of 3-Dintegration are depicted in Fig. 4.6 where the timing slack of inter-tier pathsgreatly increases as long wires are substituted with TSVs.

-0.5 0 0.5Slack [ns]

0

2

4

6

8

10

12

14

16

18

20

Nu

mb

er

of

Pa

ths

Positive Slack

(a) B04 in 2-D.

-0.5 0 0.5Slack [ns]

0

2

4

6

8

10

12

14

16

18

20

Nu

mb

er

of

Pa

ths

Positive SlackNegative Slack

(b) B04 in 3-D.

Figure 4.5: Timing slack histogram of inter-tier paths for B04 circuit.

4.3.3 Power Analysis of 3-D Circuits

In this subsection, power analysis in 3-D ICs is performed. The proposed flowextends previous analysis from first-order models that combine power informationof logic gates and wirelength [124] to advanced 2-D commercial tools, where poweris analyzed in both average and time-based mode. This extension is supportedby merging the extracted RC of each tier within a 3-D circuit and adding theappropriate RC of the vertical interconnects. For a fair comparison between thepower of 2-D and 3-D circuits, the same clock frequency is used in both cases forall the explored circuits. The operating voltage is assumed to be 0.9 V and 1.2 Vfor the 45 nm and 65 nm technologies, respectively.

Average power analysis in modern circuits is useful for battery life considera-tions. For this analysis, the toggle rate of all nets and cells within each benchmark


-0.5 0 0.5 1 1.5 2 2.5Slack [ns]

0

50

100

150

200

250

Nu

mb

er

of P

ath

s

LDPC 2-D

LDPC 3-D

Figure 4.6: Comparison of timing slack histograms of inter-tier paths for theLDPC circuit in two and three dimensions, respectively.

circuit is assumed to be 20%. The average power consumed by the circuits anda breakdown into the different power components are listed in Table 4.6. Powerin B04 and B20 is increased with 3-D integration by 67% and 34%, respectively,which may appear as counterintuitive. For B04, this increase is due to the greateffect of TSV on the area and total wirelength of the circuit (see Table 4.4). Theincrease in wirelength leads to larger interconnect capacitance, thereby increasingpower. With the proposed flow, different bonding styles can be explored to miti-gate this increase in power. For example, with face-to-face (F2F) bonding, no TSVis employed and thus no overhead in area exists. On the other hand, for circuitB20, the total wirelength is reduced in three dimensions but the power increases.This behavior demonstrates the non-negligible effect of TSV capacitance to thetotal power. Different TSV technologies can be utilized for this circuit to reducethe capacitance of the vertical interconnects.

The circuits B19, AVA, and LDPC exhibit less power in 3-D as comparedto the 2-D versions by 5%, 3.5%, and 14.6%, respectively. These power gainscan be further enhanced by removing and/or downsizing cells. The proposedflow facilitates this optimization step. By combining the power and performanceanalysis discussed in Section 4.3.2, cells, which are power-intensive and non-timingcritical, can be identified as potential candidates for downsizing.


Tabl

e4.

6:Av

erag

epo

wer

cons

umpt

ion

ofbe

nchm

ark

circ

uits

.

Powe

rB0

4B1

9B2

0AV

ALD

PCco

mpo

nent

s2-

D3-

D2-

D3-

D2-

D3-

D2-

D3-

D2-

D3-

DN

ets

[mW

]0.

0041

50.

0135

01.

7666

1.60

330.

1498

0.27

290.

4105

0.38

5120

.49

16.2

638

Cel

ls[m

W]

0.01

010.

0103

1.53

481.

5294

0.21

530.

2183

0.33

340.

3326

8.76

458.

6993

Leak

age

[µW

]0.

007

0.00

71.

110

1.11

00.

160

0.16

00.

244

0.24

40.

024

0.02

4To

tal[

mW

]0.

0142

0.02

383.

3025

3.13

380.

3653

0.49

140.

7441

0.71

7929

.254

524

.963

2


A distinct advantage of this flow is that the power of 3-D circuits is monitored onapplication-specific benchmarks by utilizing transient power analysis. This analysisis useful for average and peak power consumption of circuits on specific workloads.To the best of author’s knowledge, this is the first work which demonstrates a flowcompatible with transient analysis in addition to considering the electrical/physicalcharacteristics of the vertical interconnects. The power consumed by the AVAand LDPC circuits on real-time tasks, such as encoding/decoding messages, islisted in Table 4.7.

Table 4.7: Power for application-specific testbenches.

Power AVA LDPCcomponents 2-D 3-D 2-D 3-DNets [mW] 0.144 0.194 8.49 7.20Cells [mW] 0.732 0.736 4.55 4.53

Leakage [µW ] 0.244 0.244 0.025 0.025Total [mW] 0.900 0.955 13.04 11.73

The total power in transient mode can differ significantly from the averagepower, as the switching of nets and cells is based on application-specific testbenchesrather than a fixed switching of all of the nets and cells as in average mode. Forexample, in AVA circuit, according to the average power analysis, three-dimensionalintegration results in 3.5% power reduction as compared to the planar circuit (seeTable 4.6). However, as a small percentage of wires switches during a testbenchfor the AVA circuit, the power from executing a specific task is greater by 6.11%in the 3-D implementation than in two dimensions (see Table 4.7). In addition,these nets include TSVs, the high capacitance of which increases the power. Thisexample demonstrates a well-known issue in traditional 2-D design processeswhere power analysis based on the average power is not sufficiently accurate andtransient analysis should be employed. The proposed flow enables this useful typeof analysis for 3-D ICs.

Furthermore, the power trace over the execution time of a decoding task forthe LDPC circuit, is depicted in Fig. 4.7. This case study demonstrates a 10%reduction in power for the 3-D circuit as compared to the 2-D circuit. This decreaseis due to the fact that the LDPC circuit is wire-dominant and this testbenchtoggles a large portion of nets. In addition, the peak power is reduced by 11.6%.

4.4. CONCLUSIONS 87

Considering the importance of thermal and reliability issues in 3-D ICs due to theincreased power densities, accurate power analysis is critical to limit expensiveoverdesign. This flow addresses this requirement by enabling multi-mode poweranalysis.

0 0.5 1 1.5 2 2.5 3 3.5 4

9

10

11

12

13

14

15

16

Pow

er [m

W]

LDPC 2-DLDPC 3-D

Figure 4.7: Power trace of the LDPC circuit by utilizing a standard 2-D and theproposed 3-D design flow.

4.4 Conclusions

In this chapter, a backend design flow is presented which enables timing andpower analysis by primarily utilizing commercial tools such as Primetime andPrimetimePX. This work is based on my published paper [13]. The designexperience of using this flow is similar to a 2-D flow, as commercial 2-D EDA toolsand a public academic 3-D tool are utilized in the majority of the stages. Newsteps are added to support the introduction of the third dimension and the broadgamut of TSV technologies and bonding styles. A crucial new step of the flow isthe merging of the SPEF files of each tier into a global SPEF file to include theimpedance of vertical interconnects. This new step enables the performance of


the 3-D circuit to be evaluated by seamlessly performing STA with mature EDAtools instead of considering the longest inter-tier delay and wirelength predictionmodels. In addition, this is the first design flow for 3-D ICs with TSVs, whichis compatible with multi-mode power analysis while considering the electricalcharacteristics of vertical interconnects, due the utilization of the global SPEFfile which created by this flow. Application of the flow to different benchmarkcircuits shows that even with no optimization effort, a two tier 3-D stack producedby the flow achieves up to 14.6% average power reduction, 18.7% performanceimprovement, and 49% footprint reduction as compared to the 2-D version ofspecific circuits.

Chapter 5

Voltage Scaling in 3-D Circuits

Three-dimensional integration with through-silicon-vias offers integrated circuitswith smaller form factors and enhanced performance and power as compared toplanar circuit. Speed improvements in TSV based 3-D ICs originate from thereduction of interconnect (RC ) delay at the critical paths. The decrease in thepower of a 3-D circuit is traditionally considered to result from the reductionof the interconnect capacitance and the number of repeaters due to the shorterwirelength. Since the power and performance of a circuit can significantly differbetween the two (planar circuits) and three dimensions (multi-tier circuits) atthe same operating voltage, this situation provides a voltage headroom to furtherimprove the speed or the power of the 3-D stack. Therefore, voltage scaling canbe utilized to improve either the performance or the power of 3-D circuits. Inthis chapter, emphasis is placed on reducing the operating voltage, as powersavings from the reduction of the wirelength are limited due to the non-negligibleparasitic capacitance of the through-silicon-vias that vertically interconnect thetiers. In the proposed approach, the operating voltage is decreased by exploitingthe additional slack that results from the shorter length of the critical nets, suchthat the performance of the circuit does not change between the two and threedimensions.

The remainder of this chapter is organized as follows. In Section 5.1, themotivation and introduction of my approach are presented. Related work tothree-dimensional circuits is discussed in Section 5.2. The utilization of voltagescaling in 3-D ICs is discussed in Section 5.3. In Section 5.4, an enhanced timingmodel for paths and guidelines for identifying whether a circuit can benefit fromthis approach are described. A new methodology integrated in the advanced EDA

89

90 CHAPTER 5. VOLTAGE SCALING IN 3-D CIRCUITS

flow from the previous chapter for globally applying voltage scaling to 3-D circuitsis presented in Section 5.5. Related results for several benchmark circuits arediscussed in Section 5.6. Conclusions are offered in Section 5.7.

5.1 Motivation

By considering the well-known power-speed interplay, a methodology to reducepower by decreasing the supply voltage is developed. The incurred increase indelay is compensated by exploiting the additional slack produced by the thirddimension. This approach is based on several results which demonstrate thatthe traditional notion of decreasing power in 3-D ICs due to the reduction ofinterconnect capacitance is not adequate, in particular, if TSVs exhibit non-negligible parasitic capacitance [74], [75]. In other words, the noticeable decreasein RC delay, as the resistance of TSV is considerably smaller than in horizontalwires, is exploited to counterweigh the increase in the delay of logic gates thatresults from reducing the supply voltage. In this way, power is saved withoutdegrading the performance.

However, depending on the characteristics of the paths within a circuit, thepower savings from applying this technique to 3-D ICs can greatly vary. Conse-quently, the critical paths of a design should be carefully considered to evaluatewhere voltage reduction does not degrade the target performance of the system.Previous works do not offer a systematic analysis, emphasizing the (critical) pathswith the longest inter-tier nets [124], [125], [126]. Although intuitive, this practicemay not lead to substantial savings in power since, as shown in this chapter, longwires are only one important constituent for the application of voltage scaling in3-D ICs.

Starting from a 2-D circuit, a full chip evaluation is utilized to assess whethervoltage scaling is applicable, rather than focusing only on the limited set of pathsthat span more than one tier. As demonstrated in this chapter, these pathscan exhibit a misleading behavior, in particular, if the speed of the circuit isdominated by the gate delay. In addition, the proposed technique enhances powerreduction in 3-D ICs while considering the characteristics of TSV based verticalintegration. A timing model considering both the interconnect length and voltagesensitivity of paths and guidelines to identify when two-dimensional circuits canbenefit from this approach are offered. Moreover, a methodology for applying

5.2. RELATED WORK OF VOLTAGE SCALING IN 3-D ICS 91

and evaluating voltage reduction in 3-D ICs at the system level is presented. Toquantify the potential power gains from this method, several benchmark circuitsare investigated.

5.2 Related Work of Voltage Scaling in 3-D ICs

Several researchers [92], [126], [130]- [135] have investigated voltage scaling in termsof voltage islands formation in 3-D ICs. This interest is due to the importanceof voltage islands to issues relating to three-dimensional integration that requireattention, such as thermal dissipation [127], power distribution network complexity[128], and process variations [129].

Zhu et al. explore several policies for task migration and DVFS to bettermanage heat in 3-D ICs [130]. Emphasis is placed at the micro-architecture levelrather than the physical level. Xu et al. utilize a mixed integer linear programming(MILP) model for voltage-island generation optimizing the heat distribution androuting resources of the power distribution network, while maintaining the circuitperformance in the three dimensions [92]. Moreover, a voltage assignment methodfor minimizing power in 3-D ICs is proposed in [131] while considering thermaland timing overheads due to level shifters. However, in both [92] and [131], thevoltage levels of each block are assumed known. Furthermore, in these works theeffect of the parasitic impedance of the TSVs on the power, performance, andvoltage level of the 3-D blocks is not considered.

Additionally, Zhan et al. proposed a partition based algorithm for assigningmodules at the floorplan level to reuse currents between voltage domains andminimize the power dissipated on the ground distribution network [132]. Kapadiaand Pasricha proposed a synthesis framework and a methodology to optimizethe power distribution network in MVS mesh based 3-D networks-on-chip (NoC)[133], [134]. In both of these works, emphasis is placed on the effect of thepower distribution network to the power of the 3-D NoC rather than how three-dimensional integration affects the operating voltage of a block/system consideringthe related speed and power.

Voltage assignment in 3-D ICs under process variations is considered in [126],[135]. Lee et al. proposed a grid based multiple supply voltage method to statis-tically minimize power while considering spatially correlated process variationsand thermal effects for 3-D ICs [126]. A first-order statistical timing model is


utilized to capture the performance of the system based only on inter-block connec-tions without considering the RC parasitic impedance of TSVs. This assumptionlimits the accuracy of this method as three-dimensional integration affects bothintra-block and inter-block nets [55] and the voltage applied to a circuit can differbetween the two- and three-dimensions. Finally, a post-silicon tuning methodologyfor improving the parametric yield of 3-D ICs with tier adaptive voltage scalingis described in [135]. A few synthetic paths are assumed in this analysis and ageneric statistical framework is utilized to evaluate the described approach ratherthan realistic benchmark circuits and accurate back-end timing information (STAlevel) of the circuits.

Table 5.1: A summary of previous works on voltage scaling and voltage domainsfor 3-D circuits.

Work Emphasis Remarks/Comments

[130] DVFS • Managing heat dissipation• Architectural level

[92] VoltageDomains

• Optimizing heat distribution• Minimizing routing resources• No TSV RC• Predefined Vdd

[131] Block-levelVoltage Assingment

• Minimizing Power• No TSV RC• Predefined Vdd

[132] Partitioning • Algorithmic based floorplan• Reusing current from Volt. Dom.

[133], [134] NoC, PDN • Minimizing power from PDN• Predefined Vdd

[126] ProcessVariations

• Block-level Volt. Assingm.• Minimizing power• Managing heat distribution• Predefined Vdd

[135] ProcessVariations

• Post silicon methodology• Tier adaptive voltage scaling• Synthetic paths evaluation• No design flow

Unlike prior works, in this chapter the voltage requirements are not constrainedto be the same between the 2-D and 3-D implementation of a circuit. This notionis based on the fact that the power and performance of a 3-D stacked circuit

5.3. VOLTAGE SCALING OPPORTUNITIES IN 3-D ICS 93

differ from the planar circuit at the same operating voltage. Hence, this situationprovides a voltage headroom which can be utilized to enhance either power orspeed in three-dimensional circuits.

5.3 Voltage Scaling Opportunities in 3-D ICs

In a multi-tier circuit, tiers are stacked and vertically interconnected with through-silicon-vias. The salient feature of this technology is the considerably shorterinterconnect length [2]. Therefore, the power and/or the performance of 3-Dstacked circuits differ from planar circuits at the same operating voltage. Thissituation is demonstrated in several works, as presented in Section 2.2.3. Thereduction of the RC parasitic impedance in critical paths leads to an increasedslack for 3-D circuits, thereby improving the performance. Alternatively, powerreduction in 3-D ICs is achieved mainly due to the reduction of the total inter-connect capacitance. The different origin of the power and speed enhancementsconstrains the available headroom for voltage scaling in the resulting 3-D ICs.Depending on how much the speed and power of the resulting 3-D system differfrom the 2-D implementation at the same voltage, the voltage can be scaledaccordingly to either enhance or compensate for any loss in these objectives.

However, three-dimensional integration is not beneficial for all circuits asexcessive usage of TSVs can lead to increased wirelength [2], [74]. In this case, theintroduction of the third dimension degrades the power and performance of thecircuit, not allowing the voltage to be scaled. In the case where three-dimensionalintegration provides some power savings, these gains can be utilized to improvethe speed of the system by increasing the operating voltage while ensuring thatP3−D = P2−D. This approach is applicable irrespective of whether the thirddimension leads to an increase in speed. However, the moderate power savingsin 3-D circuits from wirelength reduction and the quadratic relation of power tovoltage heavily restrain the available voltage headroom. Therefore, consideringthe linear dependency of frequency to the dynamic power, the resulting increasein speed is marginal. In addition, the increase in voltage is limited due to thehigh thermal densities in 3-D circuits. Indeed, the higher voltage levels lead to anincrease in power and as a result in greater heat generation [91].

Alternatively, reducing the operating voltage by exploiting the additionaltiming slack that three-dimensional integration supports is a meaningful way


to enhance the power savings in 3-D ICs. This method, exploiting the speedimprovements in 3-D circuits, offers an alternative means to reduce power in3-D ICs compared to the traditional approach of the decreased interconnectcapacitance. However, the applicability and the gains of this technique varyaccording to the available timing slack of the critical paths. This timing, in turn,is strongly dependent upon the characteristics of the gates and wires composingthese paths. A methodology, which considers the timing behavior of the paths,for identifying which circuits can benefit from this approach is presented in thenext section.

5.4 Determine Whether Voltage Reduction isApplicable

As discussed in the previous section, voltage scaling is not applicable to all 3-DICs. In addition, evaluating the speed and power tradeoff in these circuits by onlyconsidering the inter-tier paths is not sufficient to determine voltage scaling asdemonstrated by the results (see Section 5.6). Consequently, the critical paths ofa circuit should be carefully examined to early evaluate whether voltage reductiondegrades the target performance. Therefore, an enhanced timing model for circuitpaths is formulated in Subsection 5.4.1, including several parameters, such as thenumber and type of gates, the interconnect segments within the paths, and thegate delay sensitivity to voltage. In Subsection 5.4.2, this model is incorporatedinto a methodology to determine when circuits can exploit voltage reduction inthree dimensions.

5.4.1 Interconnect and Voltage Aware Timing Model

The delay of a logic path depends both upon the number and type of gates andthe wires interconnecting these gates. The method of logical effort [136] is a usefultechnique for estimating the gate delay in CMOS circuits. Based on the extensionof this method in [137], where interconnects are also considered, the delay of aN -stage path (see Fig. 5.1), normalized to the delay of a minimum sized inverter(τ), is

dpath =N∑i=1

(gi · (hi+hwi) + (pi+pwi)

), (5.1)

5.4. VOLTAGE SCALING IN 3-D ICs: WHEN? 95

where gi and pi are, respectively, the logical effort and parasitic delay relatedto the characteristics of the logic gate. These parameters for different gatesconsidering simple layout styles are obtained from [136]. The parameter hi is theelectrical effort of the gate defined as the ratio of the input capacitance of twosuccessively connected gates (Ci+1/Ci), as depicted in Fig. 5.1 for gates gi andgi+1. The capacitive effort hwi and the resistive effort pwi for interconnects are,respectively [137],

hwi = li · cwi/Ci, (5.2)

pwi = li · rwi · (0.5 · li · cwi +Ci+1)/τ, (5.3)

where li is the length of the wire and rwi , cwi are, respectively, the resistanceand capacitance per unit length depending whether the wire is composed of local,intermediate, and/or global metal layers. However, in this model the sensitivityof the gate delay to voltage is not considered.

gi-1

gi gi+1

Ci-1

Cpi-1 Rwi-1

Cwi-1

2Cwi-1

2

Ci

Cpi Rwi

Cwi

2

Ci+1

Cpi+1 Rwi+1

Cwi+1

2Cwi+1

2

Cwi

2

Figure 5.1: A typical path composed of gates and interconnects.

The effects of voltage and temperature variations on logical effort are modeledin [138] for several process nodes and for different operating regions of MOSFETs,such as strong, moderate, and weak inversion. Considering that transistors indigital circuits typically operate in strong inversion, the logical effort is

g′i = gi ·gu, (5.4)

gu = VddA(T ) · (Vdd−Vth0 +k ·T )(3/2) , (5.5)

where gu is a fitting function obtained from [138] to capture the effect of voltageand temperature on logical effort across different technologies. A(T ) is a secondorder polynomial function of temperature (T ) depending upon the technology,Vth0 is the voltage threshold at 0°C and k is the slope of Vth as a function of


temperature. At the 45 nm process node, for Vth0 = 0.46V , k = 6.32 ·10−4, andT = 25°C [138]

gu = Vdd2.413 · (Vdd−0.4442)(3/2) , (5.6)

where for Vdd = 1V , gu = 1. Therefore, by rewriting (4) to include the voltagesensitivity, the delay of the path is

dpath =N∑i=1

(gi ·gu · (hi+hwi) + (pi+pwi)

). (5.7)

With this model, the delay of a critical path CP is described as a function of thenumber (N) and type of gates (gi,Ci), length of the interconnect segments (li)and operating voltage (Vdd)

dpathCP = f(N,gi,Ci, li,Vdd). (5.8)

It is worth mentioning here that the timing model proposed in this work cansupport temperature variations as depicted in (5.5) and (5.7). This feature can beuseful to explore paths in 3-D circuits while considering voltage, wirelength, andtemperature variations. The increased temperatures of stacked dies can have animpact on the delay of critical paths as the interconnect resistance can increase.Therefore, reducing the operating voltage can mitigate heat issues and also havea smaller impact on the delay of the paths as the wirelength will be reduced forthe 3-D circuit. This topic is considered as future work.

5.4.2 Timing-Slack Voltage Reduction Methodology for 3-D ICs

A standard design flow (assuming this flow can incorporate the third dimension)can provide all of the timing information across different operating voltages fora 3-D circuit and determine any useful voltage reduction. However, this processis highly timing consuming. Therefore, several guidelines are presented in thissubsection for identifying early in the design process, which two-dimensionalcircuits allow for voltage reduction with vertical integration.

The objective of this analysis is to roughly estimate the slack of the pathsin a 3-D circuit (e.g. ∆d2Dto3D = d2D − d3D) at the same operating voltageand utilize any increased slack (i.e. ∆d2Dto3D > 0) for voltage reduction such


that ∆d2Dto3D = 0 at Vdd3D < Vdd2D . In order to identify if voltage reduction ispossible for a circuit, the critical paths of a 2-D circuit are grouped and sorted inascending order of slack (i.e., descending order of criticality). The post-routingtiming information should be considered for these critical paths such that the RCparasitic impedance of the wires is included in the delay of the path. Afterwards,a user-defined threshold is applied to select the number of paths which will beanalyzed. This threshold is effectively a knob trading off the accuracy in projectingthe available slack and voltage reduction in 3-D ICs with computational time.Using few paths leads to a fast analysis. However, the accuracy is low, as somenon-critical paths can exhibit great increase in delay with voltage reduction and,therefore, become critical. Hence, representative paths with various slacks andlogic depths should be selected and analyzed. The guidelines in this sectionfacilitate this process.

A N -stage path comprises Nwires segments with various lengths. Thesesegments are grouped based on the metal layers used for the routing of eachsegment as

Nwires =Nlocal+Nint+Nglobal, (5.9)

where Nlocal, Nint, and Nglobal is the number of wire segments laid out within thelocal, intermediate, and global metal layers, respectively. To determine how thelength of the wire segments is affected by the third dimension, a function whichdescribes the length of wires (li) in two and three dimensional circuits (li3D =f(li2D)) is required. This function can be obtained from wirelength predictionmodels, such as [33], [74]. The delay of the critical paths in the 3-D circuit isprojected by incorporating this function with the model described in Section 5.4.1.The integration of the wirelength and timing models intends to provide a passor fail check, indicating whether wirelength reduction within the paths leads touseful voltage reduction with the introduction of the third dimension.

This model is applied to selected critical paths according to the chosen threshold.Without loss of generality, this analysis is carried out on a 45 nm technology,where the input capacitance of the gates is obtained from [139]. For differentprocess nodes, the function of gu can be fitted similarly to [138] and the inputcapacitance is obtained from the corresponding technology libraries. Moreover, afirst-order wirelength model for 3-D circuits is used [140]

li3D = li2D√n, (5.10)


where n is the number of tiers of the 3-D stack. This function is applied tointermediate and global wires, whereas the length of the local interconnect segmentsis assumed to not change in the 3-D stack [140]. The inherent inaccuracy ofwirelength models do not permit a highly accurate estimate for the additionalslack. Rather the precise increase in slack is determined by a complete designflow supporting voltage scaling (see Section 5.5). In general, (5.10) offers a looseupper bound for describing the decrease in wirelength as the overhead of TSVarea in wirelength is not captured, however this model offers useful estimates asdemonstrated by the results in Section 5.6.

As observed by many interconnect prediction models, three-dimensional in-tegration reduces the wirelength of a circuit by decreasing the length of longintermediate and global interconnects [2]. If these long wires belong to the criticalpaths of the circuit, the additional timing slack from the RC reduction can beutilized to reduce the voltage. Guideline 1, circuits where critical paths compriseonly local interconnects are not expected to support voltage reduction in threedimensions. In addition, wirelength reduction in 3-D ICs depends on severalfactors, such as the size and number of the TSVs and the number of tiers [74].Therefore, at the early design stages of a 3-D circuit, the wirelength predictionmodels should be employed to determine whether wirelength decreases for the tar-get 3-D technology. Guideline 2, when the critical paths of a 2-D circuit compriselong intermediate and global wires and wirelength reduction is predicted for the3-D technology, voltage reduction is applicable to this circuit.

Applying (5.7) to the critical paths of a planar circuit and the projectedwirelength reduction within these paths from (5.10), an estimate for the addedtiming slack in a 3-D circuit is determined. The projected decrease in the delayof several paths based on this model for the benchmark circuits in Section 5.6is depicted in Fig. 5.2. The increased timing slack from the introduction of thethird dimension depends upon both the number of the gates and the length of thewire segments of the path. Consequently, as shown in this figure, paths with longglobal wires exhibit greater delay reduction as compared to paths with shorterintermediate wires due to the larger wirelength reduction from vertical integration.In addition, less gate-dominated paths exhibit a larger speedup than paths withmore gates for the same wire RC reduction. Guideline 3, the higher the portionof the interconnect delay in a path, the greater the increase in the timing slackdue to the vertical integration and thereby the larger the decrease in voltage.


1 2 3 4 5 6 7 8Number of Tiers, n

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Nor

mal

ized

Del

ay D

3-D

/D2-

D

N=10 with Global NetN=10 with Interm. NetN=20 with Global NetN=20 with Interm. Net

Figure 5.2: Delay reduction due to vertical integration for paths with differentnumber of gates (N) and the same global and intermediate wire segments, wherewirelength changes with n according to (5.10).

A tradeoff, however, exists between the added timing slack in the 3-D ICs andthe increase in the delay due to voltage reduction. Considering (5.7), the rate ofchange in delay depends linearly upon the interconnect length (∂dpath∂l ∝ l) andsuperlinearly to the operating voltage (∂dpath∂Vdd

∝− 1V

3/2dd

). This behavior indicatesthat the gates comprising the paths of the circuit primarily determine the capabilityto scale voltage as compared to wirelength reduction. Consequently, the effectof decreasing the voltage on the delay of the gates should also be considered todetermine the sheer advantage of voltage scaling. This requirement has driventhe development of a complete backend design flow described in Chapter 4 toholistically and accurately determine the timing slack in a 3-D circuit.

The sensitivity of the delay of a gate to voltage reduction is

sgatei =∣∣∣∣∣∆dgatei∆Vdd

∣∣∣∣∣ , (5.11)

where ∆dgatei is the delay increase of gate i for voltage reduction of ∆Vdd. Gateswith lower sensitivity utilize more efficiently a given amount of voltage reduction,in terms of less added delay, as compared to gates with higher sensitivity. The


delay of diverse gates with different driving strengths exhibits different sensitivityto voltage, as depicted in Fig. 5.3. This delay sensitivity to voltage (5.11) isdescribed by a first-order model using the first partial derivative of (5.7) to Vddfor a given gate i,

sgatei =∣∣∣∣∣∂dgatei∂Vdd

∣∣∣∣∣= giCi·Cloadi ·

∣∣∣∣∣ ∂gu∂Vdd

∣∣∣∣∣ , (5.12)

Cloadi = Ci+1 + li ·Cwi , (5.13)

where ∂gu∂Vdd

is technology dependent and Cloadi is the load capacitance that gatei drives. The sensitivity of the delay of a gate to the changing voltage dependsupon the factor gi

Ci, as the load capacitance is not a characteristic of the gate. The

delay of stronger gates (higher Ci) with low logical effort (gi) is less sensitive tovoltage changes as compared to weaker gates with higher logical effort.

0.7 0.8 0.9 1Vdd [V]

1

2

3

4

5

6

7

8

De

lay,τ

(a)

0.7 0.8 0.9 1Vdd [V]

0

2

4

6

8

10

12

14

16

18

Se

nsiti

vity

,s

ga

tei

Nand-1xNor-1xXor-1xInv-1xInv-2xInv-4xInv-8x

(b)

Figure 5.3: (a) Delay and (b) sensitivity of logic gates to voltage reduction whiledriving a minimum size inverter.

As a path is composed of both gates and wires, the sensitivity of a path isdescribed by adding the product of the sensitivity of each gate and the load


capacitance,

spath =∣∣∣∣∣∂dpath∂Vdd

∣∣∣∣∣=N∑i=1

giCi·Cloadi ·

∣∣∣∣∣ ∂gu∂Vdd

∣∣∣∣∣ . (5.14)

This equation indicates that the potentially reduced interconnect capacitancedue to vertical integration decreases the delay sensitivity of the paths sinceCload decreases. Hence, the delay of a 3-D circuit can be less sensitive to voltagereduction as compared to the 2-D counterpart. For two critical paths with differentsensitivities to voltage (taken from the benchmark circuits in Section 5.6), theprojected voltage reduction is depicted in Fig. 5.4. Both paths are assumed toexhibit the same decrease in wirelength.

D3-D=D2-D:Vdd=1

0.7 0.75 0.8 0.85 0.9 0.95 1Vdd [V]

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

No

rma

lize

dD

ela

yD

3-D

/D2

-D:V

dd

=1

N=7, Low sensitivityN=18, High sensitivity

Figure 5.4: Change in delay of paths with different path sensitivity where voltageis gradually reduced.

For the path with high sensitivity to voltage, the additional slack from thethird dimension is absorbed fast by the increase in delay of the many gates(N = 18), resulting in only 7% voltage decrease. Alternatively, the path with thelow sensitivity to voltage, which comprises fewer (N = 7) and less sensitive gates,absorbs the additional slack at a slower rate with the change in voltage. Thissituation leads to an almost double reduction in voltage (13%) without affectingthe original delay of the 2-D path (D3−D = D2−D:Vdd=1). Guideline 4, for 3-D


circuits with paths that exhibit low sensitivity to voltage (according to (5.14)) andlong wires, voltage reduction is higher.

5.5 Design Flow Extension for Voltage Reduc-tion in 3-D ICs

If the application of the model presented in Section 5.4 indicates an increased slack(i.e. ∆d2Dto3D > 0) and an acceptable voltage reduction from the third dimension,a design flow and a methodology are required to precisely determine the alloweddecrease in voltage and the related power savings. In addition, the effect ofdifferent TSV technologies and bonding styles on timing should be considered. AnEDA compatible flow and a methodology for applying and quantifying the savingsin power from voltage reduction in 3-D stacks, are described in this section.

The design flow described in Chapter 4 ([13]) is utilized to evaluate theperformance of the 3-D IC. The main input to this flow is the Verilog/VHDLdescription of the design. However, as iso-performance operation is a requisite, thesame timing constraints and operating voltage as in the two-dimensional circuitare used to produced the synthesized netlist. This situation is depicted in Fig. 5.5.The synthesized netlist is input to the 3-D floorplanner, generating a floorplan foreach tier based on the selected TSV technology and bonding style. The circuitpartition among the tiers and the position of the TSVs are determined by the3-D Craft [107]. Moreover, a commercial tool, such as Encounter [104], is utilizedto place and route the cells in each tier. After these steps, the netlists fromeach tier are merged while performing a design equivalence check through theFormality tool [116]. In addition, the standard-parasitic-exchange-format (SPEF)file of each tier are merged into a global SPEF adding the parasitic impedance ofTSVs, as described in Section4.2. In the last step, a timing analysis tool, such asPrimeTime [117], is utilized to determine the new and increased slack.

The methodology for evaluating voltage reduction in 3-D circuits is depictedin Fig. 5.6. Initially, the timing of the 3-D circuit is analyzed by performing STAat the 3-D stack for the same operating voltage as in the 2-D circuit (V dd3−D =V dd2−D). During this step, the exact gained (or lost) slack from the usage ofthe third dimension is obtained. In the case where the delay of the 3-D circuitis increased, voltage reduction is prohibited. However, following the guidelinesand using the timing model in Section 5.4, this situation can be avoided. If no

5.5. VOLTAGE SCALING IN 3-D ICs: HOW? 103

Eacher:3-D_Syn.v,3-D.def

Place&Route

3-D_PnR.v

Formality

3-D_Frm.v

RTL.v

Synthesis

Syn.v

STA + Power

2-D.sdc VDD2-D

BondingStyle+TSVsize

3-DParoner/Floorplanner

MergeSPEF

Figure 5.5: Stages of a 3-D design flow where commercial EDA tools with standardfile formats are utilized. At iso-performance operation as compared to the 2-Dcircuit, the same timing constraints and operating voltage are used as inputs.

additional slack is predicted by the model, smaller TSV or different bonding stylesneed to be considered during partitioning to minimize the overhead of TSV onthe speed of the circuit.

Alternatively, when three-dimensional integration enhances the performanceof the circuit, the increased slack is utilized for voltage reduction in the 3-D stack.To perform timing analysis across multiple voltage levels, interpolation betweendifferent process, voltage, and temperature (PVT) libraries is required [117].Therefore, for a range of operating voltages and by starting from the sameoperating voltage as in the 2-D circuit, voltage is gradually reduced (V dd3−D <

V dd2−D) and STA is performed. During this voltage sweep, the same process andtemperature variations are assumed. The minimum operating voltage for the 3-Dcircuit is reached when the performance is equivalent to the original 2-D circuit(D3−D =D2−D). Afterwards, power analysis is performed to quantify the powerimprovements of the 3-D stack as compared to the 2-D circuit. Average poweranalysis, assuming toggling rates for nets and cells, is performed for battery life


D3-D<D2-D

D3-D=D2-D

Yes

Yes

No

No

Interpolate libraries

Reduce Vdd3-D

STA

Power Analysis

Peak Power

Average Power

STA

- 3-D_Frm.v- 3-D.spef- Libraries- Vdd3-D

=Vdd2-D

Return to 3-D partitioner

Figure 5.6: Methodology to evaluate the voltage reduction in 3-D circuits.

considerations. Moreover, cycle-accurate events produced from testbenches duringback-annotated simulation of the circuits are used to evaluate the peak power ofthe 3-D stack.

5.6 Results

The voltage scaling methodology is applied to several benchmark circuits im-plemented in three dimensions in Section 5.6.1. The power savings from thisapproach are quantified in Section 5.6.2. PrimeTime [117] is used for timing andboth average and transient power analysis.

5.6.1 Applicability of Voltage Scaling to 3-D ICs

Several benchmark circuits are utilized to evaluate any potential gain in powerdue to the change in voltage enabled by the reduced wirelength. The benchmark

5.6. VOLTAGE SCALING IN 3-D ICs: HOW MUCH? 105

circuits are listed in Table 5.2. These circuits are evaluated at a 45 nm and a 65nm technology from TSMC [139]. As advanced technologies are utilized, via-firstTSVs with a diameter 1 µm, length 10 µm, resistance 334 mΩ, and capacitance3 fF are assumed for the vertical interconnects [48], [49]. Moreover, two tiersbonded face-to-back are assumed.

Table 5.2: Benchmark circuits.

Circuits ReferenceB04, B19, B20 [120]

AVA [121]LDPC, AES, DES3, FFT [122]

The characteristics of the benchmark circuits in both the two and threedimensions are reported in Table 5.3. According to these results, the verticalintegration improves the wirelenth for several circuits at the 45 nm technology. Dueto the very small size of B04 circuit, any insertion of TSV increases considerablythe area resulting in increased wirelength. The large number of TSVs for theLDPC circuit in addition to the small process node lead to increased wirelength,although the speed of this circuit is dominated by the delay of the wires. However,this situation changes when utilizing an older process node (65 nm).

Table 5.3: Area, wirelength, and number of TSVs for the benchmark circuitsdesigned both in two and three dimensions.

Circuit Processnode

# of Cells[K]

2-D 3-DArea WL Area WL # of[µm2] [mm] [µm2] [mm] TSVs

B04 45 G 0.31 1.038 3.4 1,315 4.3 123B19 45 G 66.1 155,304 919 166,502 795 745B20 45 G 12.1 22,508 130 23,552 105 545AVA 45 G 12.3 38,575 257 40,528 215 651

LDPC 45 G 40.0 107,605 1,974 113,258 2,325 3,820LDPC 65 LP 67.0 469,593 6,328 479,770 4,645 4,384DES3 45 G 50.7 102,749 666 106,923 615 925AES 45 G 117.3 232,545 2,330 237,998 2,211 934FFT 45 G 242.7 713,982 4,656 718,365 4,067 995


In order to reduce the operating voltage, a circuit must exhibit an increasein speed in three dimensions as compared to the planar implementation of thecircuit at the same operating voltage. The early analysis of the critical paths of aplanar circuit, as discussed in Section 5.4, determines whether voltage reductionis possible for the target 3-D technology. Therefore, representative paths, theslack of which is up to 20% of the clock period, are examined for the benchmarkcircuits, as described in Section 5.4. In addition, the 3-D EDA flow described inSection 5.51 (see Fig. 5.5), is utilized to accurately quantify the additional timingslack of the circuit in three dimensions. The resulting slack for the benchmarkcircuits in the two tier stack from the application of the EDA flow and the timingmodel of Section 5.4, respectively, are listed in Table 5.4. Note that only for thebenchmark circuits which exhibit wirelength reduction, performance is reportedin this table.

Table 5.4: Supported clock period of the benchmark circuits for the same operatingvoltage as in the 2-D design.

Circuit Process Vdd 2-D Tclk 3-D Tclk Slack Diff. Proposednode [V ] [ps] [ps] [ps] [%] Model

B20 45 G 1 1,125 1,171 -46 -4.1 (-) FailB19 45 G 1 1,273 1217 56 4.4 (3.6%) PassAVA 45 G 1 1,522 1,304 218 14.3 (11%) Pass

LDPC 65 LP 1.2 4,278 3,468 810 18.9 (22%) PassDES3 45 G 1 1,068 1,043 25 2.4 (1.8%) PassAES 45 G 1 946 1,152 -206 -21.7 (-) FailFFT 45 G 1 1,421 1,205 216 15.2 (14.9%) Pass

Based on the proposed model, B20 and AES fail to show any speed improvementin three dimensions. This behavior is due to the fact that the speed of these circuitsis primarily determined by the delay of the gates (gate-dominated circuit), whereonly local wire segments are utilized to interconnect the gates within the criticalpaths. Considering guideline 1 and (5.10), no added timing slack is predicted bythe proposed model and reducing the voltage is not an option for these circuits.Indeed, the application of the flow to B20 and AES yields a negative slack due tothe overhead of the TSVs which negatively affects the delay.

1A more elaborate description of this flow is presented in Chapter 4.


Moreover, for circuits such as B19 and DES3, where a small portion of the delayis due to some short intermediate wires within the critical paths (see guideline 3),the proposed model predicts a positive slack of 3.6% and 1.8%, respectively. Amarginal improvement is also determined by the flow described in Section 5.5. Asthree-dimensional integration reduces slightly the delay of these circuits, voltagereduction is limited.

Alternatively, wire-dominated circuits, such as AVA, LDPC, and FFT, exhibitgreat performance improvements of 14.3%, 18.9%, and 15.2%, respectively, ascompared to the 2-D design at the same operating voltage. This situation isalso predicted by the timing model and guideline 2. Note that for the LDPCcircuit, the longest inter-tier path exhibits 31% delay reduction. However, theclock period is constrained by paths within a tier (intra-tier paths) and as a resultthe delay decreases only by 18.9%. This example demonstrates that consideringonly the inter-tier paths is not sufficient to determine the actual increase in slackand thereby the potential voltage reduction in 3-D ICs.

The increased slack from the introduction of the third dimension is exploitedto reduce the voltage by utilizing the methodology in Section 5.5 (see Fig. 5.6).The minimum operating voltage for the benchmark circuits at iso-performanceoperation with the 2-D circuits is listed in Table 5.5. The operating voltagereduction for the circuits B19 and DES3 is rather negligible as the increasedslack from the vertical integration is limited. Alternatively, considerable voltagedecrease is obtained for the rest of the circuits. The greatest voltage reduction isobtained for the LDPC circuit, where long global wires are utilized in the criticalpaths of the planar version of the circuit.

Table 5.5: Reduction of the operating voltage for the benchmark circuits due tothe added timing slack produced by the 3-D stacking.

Circuits 2-D Vdd [V ] 3-D Vdd [V ] Reduction [%]B19 1.0 0.98 2AVA 1.0 0.92 8

LDPC 1.2 1.03 14DES3 1.0 0.97 3FFT 1.0 0.88 12

Interestingly, for almost the same timing slack (∼217 ps) for AVA and FFT,the operating voltage for the FFT circuit is decreased up to 12% whereas for the


AVA this decrease is only 8%. Considering guideline 4, the FFT circuit exhibitsgreater voltage reduction as the delay of the critical paths is less sensitive tovoltage reduction than in the AVA circuit. This behavior is predicted from theproposed model in Fig. 5.4, where the critical path of FFT (N = 7) with lowsensitivity to voltage, absorbs the additional slack at a slower rate with the changein voltage than the critical path of AVA (N = 18). This situation allows for agreater voltage reduction for the FFT in order to reach the delay of the 2-Dcircuit.

Considering the delay of the critical paths and the architecture of the FFT thefollowing remarks can be made. The delay of the path is about 50% due to gatesdelay and 50% of wire delay. In an ideal partitioning and ideal 3-D integration(gates connected with TSVs are completely vertically aligned), a 25% improvementwould be expected as mainly the long wires to the 1st or 2nd butterflies wouldbe shortened, whereas the wires to the next butterflies would remain the same.However, TSVs do not replace wires completely, there is some routing distancefrom the driver to the TSV and then from the TSV to the next gate (load). Sodue to this extra routing distance the improvement in delay is around 15%. Thisextra routing was occurred because the driver had fanout greater than 1, thus itwas placed farther apart from the TSV in order to be closer (better balanced nettree) to the other gates of the net (in the same tier). The same thing occurred forthe receiver gate, which was placed in between the TSV and the next connectedgate.

5.6.2 Quantifying Power Gains

In this subsection, the power consumption of the benchmark circuits where voltagereduction is applicable to vertical integration is investigated. The methodologyillustrated in Fig. 5.6 is utilized to measure the power of the circuits under differentscenarios. The simulation scenarios are listed in Table 5.6. The baseline scenario(S1) is the 2-D circuit for a specific frequency and operating voltage. Scenario S2is selected to quantify the power savings from 3-D integration only due to thedecreased interconnect capacitance. Finally, the proposed approach for reducingthe operating voltage while keeping the same operating frequency as in the 2-Dimplementation is considered in scenario S3. The resulting operating voltage foreach circuit is listed in Table 5.5.


Table 5.6: Simulated scenarios for evaluating power consumption.

Scenarios Integration Frequency Operating VoltageS1 2-D F2−D V2−DS2 3-D F3−D = F2−D V3−D = V2−DS3 3-D F3−D = F2−D V3−D < V2−D

For the average power analysis, the toggle rate of all cells and nets within eachcircuit is set to 20%. The total average power consumed by the circuits in all ofthe scenarios is illustrated in Fig. 5.7. In addition, the breakdown of the totalpower into the different power components is listed in Table 5.7. The decrease ofinterconnect capacitance in 3-D ICs results, on average, in 6.6% power reductionfor the benchmark circuits. The power savings of this approach are limited,as the power dissipated by the cells is not significantly reduced. Alternatively,the proposed approach of reducing the voltage while meeting the 2-D timingconstraints leads on average to 22.3% power reduction, as this approach exploitsthe quadratic relation of voltage to the dynamic power of both cells and nets.

LDPC AVA FFT DES3 B190

20

40

60

80

100

120

Po

we

r N

orm

aliz

ed

to

P2

-D[%

]

S1 S2 S3

100.0100.0100.0100.0 100.0

91.6

99.0

89.094.696.4

62.2

77.6

67.8

91.6

85.3

Figure 5.7: Total average power consumption at the same speed D2−D =D3−D.


Table 5.7: Breakdown of average power to its components.

Circuit Scenario Nets Cells Leakage[mW ] [mW ] [µW ]

S1 20.49 8.76 0.024LDPC S2 16.26 8.69 0.024

S3 11.88 6.31 0.015S1 0.51 0.47 1.07

AVA S2 0.48 0.46 1.07S3 0.40 0.36 0.67S1 10.93 16.15 25.3

FFT S2 8.18 16.12 25.3S3 6.16 11.84 13.1S1 1.58 2.12 3.5

DES3 S2 1.55 2.11 3.5S3 1.44 1.95 2.9S1 2.22 2.11 4.85

B19 S2 2.02 2.09 4.85S3 1.91 1.95 4.23

Furthermore, application-specific testbenches from [121] and [122] are utilizedto quantify the total (cycle-accurate) and peak power consumption. These test-benches simulate real-task events for the benchmark circuits, such as decodingmessages for LDPC and AVA, encrypting and decrypting messages for DES3,and fast fourier transformation of a signal for the FFT circuit. In Fig. 5.8, thepower trace for the LDPC circuit is plotted for the scenarios listed in Table 5.6.Three-dimensional integration reduces the power by 10% due to shorter wires,while the approach of reducing the voltage achieves a 34% decrease in power.In addition, with the proposed approach the peak power is reduced by 27% ascompared to the 3-D version where the voltage is not changed (V3−D = V2−D).

The power savings of application-specific testbenches for the 3-D investigatedcircuits as compared to the 2-D counterpart, are depicted in Fig. 5.9. In addition,the breakdown of the total power consumption and the peak power for executingtasks on circuits are listed in Table 5.8. AVA and DES3 exhibit a higher total andpeak power as compared to two dimensions. This behavior is due to the smallpercentage of switching wires during these tasks and these nets contain TSVs.This example demonstrates the limitations of 3-D integration to provide powerreduction if only the reduction in the wire capacitance is considered as has been

5.7. CONCLUSIONS 111

0 0.5 1 1.5 2 2.5 3 3.5 4

7

8

9

10

11

12

13

14

15

16P

ow

er

[mW

]2-D Vdd=1.20 V3-D Vdd=1.20 V3-D Vdd=1.03 V

Figure 5.8: Power trace of the LDPC circuit for different scenarios.

done by several previous works. For the FFT circuit, reducing the voltage resultsin 28.5% and 29.8% decrease in total and peak power, respectively, as comparedto the 3-D case where V3−D = V2−D (scenario S2). This situation demonstratesthe effectiveness of the proposed approach to significantly decrease power withoutcompromising performance. Furthermore, the specific characteristics of the circuitare captured accurately predicting whether voltage reduction at iso-performanceis an option for this circuit in three dimensions.

5.7 Conclusions

In this chapter, voltage scaling in TSV-based three-dimensional logic circuits isinvestigated. This work is based on my published journal paper [14]. The objectiveis to exploit the additional slack on critical paths from the introduction of the thirddimension to enhance power savings by reducing the supply voltage. An enhancedtiming model for circuit paths based on logical effort is presented to addressthe tradeoff between voltage and interconnect reduction. Guidelines are offeredto identify early in the design process if 2-D circuits can benefit from voltagereduction with vertical integration. A methodology for applying and evaluatingvoltage reduction in 3-D ICs at the system level is presented. The traditional


LDPC AVA FFT DES3-10

-5

0

5

10

15

20

25

30

35

40

PowerSavingsoverP2-D[%]

S2 S334.4

17.0

31.6

4.3 4.3

-6.3-5.2

10.0

Figure 5.9: Power savings of application-specific testbenches for the 3-D inves-tigated circuits (scenarios S2 and S3, see Table 5.6) as compared to the 2-Dimplementation of the circuits (scenario S1).

Table 5.8: Breakdown of total (cycle-accurate) power to its components and peakpower for application-specific testbenches.

Circuit Scenario Nets Cells Leakage Peak[mW ] [mW ] [µW ] [mW ]

S1 8.49 4.55 0.024 15.11LDPC S2 7.20 4.53 0.024 13.53

S3 5.26 3.30 0.015 9.87S1 0.16 0.98 1.14 1.61

AVA S2 0.23 0.98 1.14 1.68S3 0.19 0.79 0.71 1.32S1 3.29 10.17 25.2 18.19

FFT S2 2.64 10.13 25.2 17.82S3 2.05 7.60 13.0 12.51S1 0.28 1.22 3.5 2.07

DES3 S2 0.39 1.24 3.5 2.20S3 0.36 1.13 2.9 1.99


notion where the power is reduced due to the decrease in the wire capacitance ofa 3-D IC leads to low and often inadequate power savings for several benchmarkcircuits (6.6% on average). Alternatively, the proposed approach results in powerreduction of 22.3% on average. Moreover, a decrease of 27% in the peak poweris observed for a specific case study as compared to the 3-D case where voltagesupply is not scaled and the same speed as in two dimensions is maintained.

Chapter 6

An Interface Circuit for Systemswith Multiple Voltage Domains

Voltage scaling is a highly effective technique for reducing power and matchingthe required speed in an circuit integrated in either two or three dimensions asdiscussed in the previous chapter. Moreover, considering that vertically integratedtiers come already with physical infrastructures to power dies with individualvoltages, the technique of voltage scaling can be included to alter the voltage of eachtier depending on the power and performance objectives of the system. However,several challenges arise in the design process for circuits that support voltagescaling at finer granularities along with power gating, notably at the boundaries ofthe voltage domains. This situation is due to the additional circuitry required atthe interfaces of the blocks which operate at different voltage levels. These circuitsimpose a significant overhead in delay and can prohibit the use of multi-voltagescaling (MVS) at specific critical paths. These paths often cross the boundariesof blocks that can otherwise operate at a different supply voltage, eliminating thegains in power by the use of multiple voltage domains (MVD). In this chapter,my published by-pass circuit [15] is presented which can alleviate these timingissues and simultaneously support multi-voltage scaling under specific operatingconditions.

The chapter is organized as follows. The characteristics and design challengesfor the required additional circuits in systems with multiple voltage domainsare discussed in Section 6.1. In Section 6.2, the proposed by-pass circuit isdescribed and a discussion about which circuits can be by-passed is provided. InSection 6.3, the simulation setup and results for all the traditional metrics, such as,

115

116 CHAPTER 6. CIRCUIT INTERFACE FOR SYSTEMS WITH MVD

performance, power, and area, of the proposed circuit are presented. Conclusionsare offered in Section 6.4.

6.1 Additional Circuitry in Systems with Mul-tiple Voltage Domains

Systems with multiple voltage domains typically utilize both multi-voltage scalingand power gating at fine granularities to reduce power [89]. The primary difficultyin utilizing these techniques within a system is the necessity of additional circuitryat the interfaces between blocks which operate at different voltage supplies. Thesecircuits are listed below:

• Level shifters: These circuits are used to scale up/down the voltage level ofsignals propagating between blocks with different power supplies and areoften contained within standard cell libraries. A level-down shifter is typicallyan inverter or a buffer cell with properly characterized timing informationat different operating voltages. Alternatively, for level-up shifters specialcircuitry is required.

• Isolation cells: These circuits are used to isolate the output signals ofpowered down blocks from propagating to active blocks. Gates, such asAND or OR, alongside an enable signal are used to clamp the output signalto a valid logic value either zero or one. These circuits are also included incommercial standard cell libraries.

• Retention registers: These circuits are used to retain the state of outputsignals from powered down blocks. Standard D-type flip flops, characterizedacross different voltage levels, can be utilized to save the state of the outputsignals.

The main disadvantage of utilizing these additional circuits between blocks isthat the additional delay of these cells can hinder the timing closure of the circuit.In addition, this problem applies to both 2-D and 3-D integrated circuits [82],[92], [131]. These interface circuits, particularly level-up shifters add significantdelay to those paths where these level shifters are employed. The most commonlyused voltage conversion cell is the feedback based level-up shifter (FLS) [82], [141].This circuit is depicted in Fig. 6.1.

6.1. ADDITIONAL CIRCUITRY IN SYSTEMS WITH MVD 117

IN

OUT

VDDH

VDDLA

B

MN1 MN2

MP1 MP2

VDDL

VDDH

Figure 6.1: Circuit schematic of a feedback based level-up shifter [141].

This conventional level-up shifter comprises two PMOS transistors (MP1 andMP2) which act as a swing-restoring load and two NMOS (MN1 and MN2) aspull-down transistors. Considering an input signal (IN) with logic value “0” thiscircuit operates as follows. The MN2 transistor is in cut off while MN1 is turnedon connecting node A to the ground. This situation turns on MP2, thereby nodeB is pulled up to logic value “1”. For an input signal with logic value “1”, theoperation of the circuit is reversed. This circuit suffers from large delay due tothe contention between the pull-up (PMOS) and pull-down (NMOS) transistorsduring the conversion of the signal.

Significant research effort has been placed to improve the performance ofthe basic level-up shifter circuit [82], [142], [143], [144]. The FLS circuit hasbeen demonstrated at a 0.35 µm process node in [143] to support “by-passing”functionality by employing pass transistors. Furthermore, multi-threshold cellsare employed to improve the power and performance of level-up shifters in [142].However, in deep submicrometer technologies, the supply voltage headroom issmaller than the voltage threshold, therefore pass transistors drive weak signalsnot able to support voltage conversion. In addition, in a multi-threshold CMOStechnology, the available threshold voltages are limited to few discrete levels, thusdecreasing the effectiveness of the proposed design in [142].

Furthermore, the additional delay of these cells can hinder the timing closurefor a system. A typical example of this situation can be observed in cached CPUs,where a core can usually operate at a lower voltage than level 1 (L1) cache1 toyield further power reduction [82], [145]. However, the timing critical paths often

1The voltage required for stable operation in memory elements is typically higher than standardlogic cells [82].


include interconnections between the core and the cache. To enable differentpower supplies between the cache and the core, voltage interface circuits shouldbe added which entail a considerable penalty in delay. Consequently, to avoid aperformance loss the power supply remains the same for both the L1 cache andcore and voltage is scaled less aggressively. This situation leads, in turn, to lowerpower savings. Moreover, this can lead to thermal issues for CPUs. This is due tothe fact that limiting the reduction of the voltage to power intensive cores enableshotspots to be created at these parts of the die.

To mitigate this delay issue, an advanced interface circuit for maintainingperformance in systems with multiple supply voltages is presented in the followingsection. The novelty of this design is based on the principle that in a multi-voltage scaling environment, different blocks can have the same voltage in specificoperation conditions, thus the voltage conversion circuitry can be by-passed. Thiscircuit interfaces different voltage domains and is suitable for by-passing severalcells, such as level shifters, isolation cells, and retention registers employed inMVS and power gating.

6.2 By-Pass Circuit Design

In this section, the design of the by-pass circuit is described. The proposedinterface circuit is based on the notion that in a multi-voltage scaling environment,different blocks have the same voltage in specific (e.g., high performance) operatingconditions. Hence, the level conversion circuitry can be circumvented. Theproposed by-pass circuit comprises three transmission gates of the same size andan NMOS transistor around the interface cells, as illustrated in Fig. 6.2. Thefirst two transmission gates (TG1 and TG2) operate as a demultiplexer which hasa single data input and using the control signal (Sel) outputs the data accordinglyto one of the two paths. The transmission gate (TG3) ensures that no currentflows through the by-passed circuit. Furthermore, the NMOS transistor (MN1)strongly pulls low the input of the interface circuit, thereby avoiding undesirableswitching and leakage due to a weak ground at Node 1, which can result whenTG1 is off.

The circuit operates as follows. If the voltage domains operate at differentvoltages, Sel is set high and the level conversion circuitry is employed to amplifythe signal. If the voltage domains are at the same voltage, Sel is set low and the

6.2. BY-PASS CIRCUIT DESIGN 119

Sel

Sel

Sel

Sel

Sel

Sel

Sel

Level co

nvers

ion

and

/or

state

rete

nti

on

circ

uit

s

Blo

ck 1

Blo

ck 2

VD

D1

VD

D2

Data

De-M

ux

TG

1

TG

2

TG

3M

N1

Clo

ad

Node 1

W=

36

0 n

m

W=

37

5 n

m

W=

715

nm

Figu

re6.

2:Pr

opos

edcir

cuit

atth

ebl

ock

inte

rface

toby

-pas

slev

elco

nver

sion

and/

oriso

latio

n/re

tent

ion

cells

used

tosu

ppor

tm

ultip

levo

ltage

dom

ains

.


by-pass path is utilized. In addition, MN1 is enabled to ground the output ofthe TG1 and the TG3 is turned off to prevent current flowing backwards. Theoperation modes of the proposed circuit are listed in Table 6.1.

Table 6.1: Operation modes of the proposed by-pass circuit.

OperationMode

Select Signal (Sel)[Logic Value]

Level Shifting 1By-Pass 0

The by-passing circuit is oblivious to the type of the interface employed,thus being applicable to all types of interface circuits, such as isolation cellsand retention registers in addition to voltage shifters. Furthermore, latches thatsynchronize the interface between blocks in two different voltage domains canbe by-passed in high performance mode where the two blocks operate at thesame frequency. In the case where the interface is complex and contains severalcomponents connected in-series [82], the benefits from by-passing this interfaceare higher. Alternatively, for signals with multiple fan-out, the demultiplexer canbe extended by flanking each interface circuit with two transmission gates andmaintaining the low latency by-pass path for the iso-voltage operating conditions.

In a multi-voltage scaling system, where a few and fixed voltage levels aresupported for different operating conditions, the power management unit can beprogrammed to generate these signals [82]. In the same manner, select signals canbe generated in a dynamic voltage and frequency scaling (DVFS) environment.Furthermore, in an adaptive voltage scaling scheme (AVS), where a control loopis used to adjust the voltage of different blocks, the proposed circuit in [88] canbe utilized for generating the control signal(s).

In terms of signal integrity (SI), the proposed circuit does not add any issues onthat aspect. This is due to the fact, that when one path is enabled the other one isdisabled. Therefore, crosstalk phenomena between these paths do not appear. Inaddition, as transmission gates are small circuits, do not contribute any significantnoise to the additional circuitry required between domains with different voltage.Moreover, metastability issues in general arise when setup and hold violationsoccur as any flip-flop can enter at a state where its output is unpredictable. Theseissues are not associated to a specific circuit but rather on a path (collection of

6.3. RESULTS 121

gates) connected to a flip-flop. Metastability issues can be found and fixed byutilizing a standard backend design flow (similar to the one described in Chapter4), where gates are modeled as cells at design libraries and the timing informationof paths (setup/hold violations) are extracted by the usage of EDA tools. Theproposed circuit can be used and tested with these tools as it comprise circuits(level-up/down shifters, isolation cells, transmission gates cells and nmos cell) thatcan be found in standard design libraries. However, if any metastability issuesarise, will not be due to the proposed circuit only but due to all the cells at thepath that can create the setup/hold violation at the destination register.

6.3 Results

The simulation setup and the effectiveness of the proposed circuit in terms ofpropagation delay, power, and area overhead, are presented in this section. Theby-pass interface circuit is simulated with HSPICE® [99] at a 32 nm processnode [96] and pre-designed circuits for level conversion and isolation cells areobtained from the Synopsys® 32 nm generic library [146]. The nominal operatingvoltage for 32 nm CMOS technology is 1 V olt [96]. Therefore, a typical ±0.2V olt swing from the nominal supply voltage is considered for low power and highperformance conditions in a multi-voltage environment [147].

The proposed circuit is simulated in a variety of operating scenarios listed inTable 6.2. Blocks 1 and 2 from Fig. 6.2 can be considered as the core and theL1 cache, respectively. Paths, which traverse these blocks from core to L1 cache,utilize level-up shifters. Alternatively, paths from L1 cache to core use level-downshifters. The first three scenarios (A, B, C) represent the situation where thecore operates in reduced voltage as compared to L1 to maximize power savings.Alternatively, scenarios D and E represent nominal and high performance modes,respectively, where both blocks have the same voltage.

The main reason of selecting to perform the simulations at 32 nm was basedon the availability of Spice schematics/layouts models of the gates (level-up/downshifters, isolation cells, transmission gates cells and nmos cell). These gates(especially level-up shifters) are difficult to be designed from the scratch. Inaddition, other technology libraries available to universities through Europractice(such as 45 nm and 65 nm from TSMC), they did not include Spice schematics forthe gates. For this reason analyzing the impact of the feature size on the proposed


Table 6.2: Simulated scenarios for the proposed circuit.

Operating Scenarios Supply Voltage [V]Mode Objective Notation Block 1 Block 2

Level shifting PowerA 0.8 1.0B 0.8 1.2C 1.0 1.2

By-pass Nominal performance D 1.0 1.0High performance E 1.2 1.2

circuit, based on simulations, was not possible. However, intuitively, decreasingthe feature size will be beneficial for the proposed circuit. This is because thedelay of level-up shifters is mainly due to contention between nodes A and B (seeFig. 6.1), whereas the the delay of transmission gates (TGs) depends on the sizeof transistors. Therefore, reducing the feature size will make the by-pass path(which contains only TGs) faster, whereas this decrease will have a small impacton the delay of level-up shifters.

6.3.1 Performance Analysis

The delay of the proposed by-pass circuit is investigated in this subsection. Thefeedback based level-up shifter2 (FLS) [141] and the traditional level-down shifter3

(LSDN) [146] (i.e. buffer) are utilized in our by-pass design for performancecharacterization. Emphasis is placed on the feedback based level-up shifter, as isbroadly used and adds large delay to the paths [82], [146].

The performance traits of the proposed by-pass circuit (PC), where FLS isutilized in a MVS system, are illustrated in Fig. 6.3. At the same operatingvoltages, the propagation delay is decreased, on average (scenarios D and E),by 86%, where level-up shifters are by-passed. In the case where the blocks aresupplied by the highest voltage assumed for the employed process node, whichindicates that the highest speed mode is enabled (scenario E), the delay decreasesby 89%. Therefore, potential timing issues due to the considerable delay of thelevel-up shifters can be effectively alleviated. In contrast, if the interconnectedblocks operate at different voltages (scenarios A, B, C), there is an overhead in

2The cell LSUP from [146] is utilized in our simulations.3The cell LSDN from [146] is utilized in our simulations.

6.3. RESULTS 123

delay of about 10%. However, this overhead is negligible for the delay of theentire path as these modes represent the power saving and consequently low speedmodes of a MVS system. Thus, timing closure is less of an issue at these lowspeed modes.

A B C D ESimulated Scenarios

0

100

200

300

400

500

600

700

De

lay [p

s]

PCFLS

Level conversionpath

By-passpath

Figure 6.3: Delay of the proposed by-pass circuit compared to traditional level-upshifters.

Furthermore, if isolation cells4 are also connected in-series with the FLS, theperformance improvements are further increased, on average (scenarios D, E), by92% while the delay overhead at low speed modes (scenarios A, B, C) remainslow, on average, at 9.5%. This situation is depicted in Fig. 6.4. In addition, theperformance traits of the proposed by-pass circuit (PC), where LSDN and isolationcells are utilized in a MVS system, are listed in Table 6.3. For the scenarios A,B, C, the delay overhead from the proposed circuit slightly increases to 15%, onaverage, as level-down shifters are faster than the FLS. However, the performancegains are noticeable (84% speedup on average) for the cases where the by-passpath is enabled (scenarios D, E).

Moreover, paths, which traverse critical blocks in two industrial circuits (Ind 1and Ind 2), are simulated at a high performance mode (see Table 6.1) to capture4The cell ISOLOR from [146] is utilized in our simulations.



0

100

200

300

400

500

600

700

De

lay [p

s]

PCFLS + ISO


By-passpath

Figure 6.4: Delay of the proposed circuit as compared when isolation cells areconnected in series with level-up shifters.

Table 6.3: Delay [ps] of the proposed by-pass circuit compared to traditionallevel-down shifters and isolation cells.

Scenarios LSDN LSDN + Isolation cellw/o PC with PC w/o PC with PC

A 170 215 235 250B 130 154 164 183C 96 110 135 159D 97 20 161 21E 74 13 116 13

their latency for a variety of interface cells. The results are listed in Table 6.4.In the case where level shifters (FLS) are utilized, the latency of these paths isincreased by 4.8% and 9.7% for Ind 1 and Ind 2, respectively, as compared towhere no interface cells are employed. Furthermore, the delay increases more, ifisolation cells are also connected in-series with the shifters (FLS+ISO), by 7.7%for paths in Ind 1 and 13.8% for Ind 2.

6.3. RESULTS 125

Table 6.4: Maximum latency for the investigated paths at high performance mode.

Interfacecircuit

Latency [ps] Delay overhead [%]Ind 1 Ind 2 Ind 1 Ind 2

No interface 1154 700 - -By-pass (PC) 1162 715 0.7 2.1

LSDN 1195 754 3.5 7.7LSDN+ISO 1215 781 5.3 11.5

FLS 1210 768 4.8 9.7FLS+ISO 1243 797 7.7 13.8

However, the additional delay of the PC (By-pass) is negligible as the latencyof these paths is increased by 0.7% for Ind 1 and 2% for Ind 2. This behaviordemonstrates that timing bottlenecks at the critical interfaces of the blocks areeffectively alleviated in high performance conditions at multi-Vdd systems byemploying the proposed by-pass circuit. Note also that latches are utilized inthe first industrial circuit path. Hence, in the case where the proposed circuitby-passes the latches in a high performance mode a 5% decrease is observed in thelatency of this path. Additionally, the support of disparate voltage supplies canbe used conversely where an intrinsically slower voltage domain can be overdrivento match the performance of the faster voltage domain.

6.3.2 Power Analysis

In this subsection, the power consumed by the proposed circuit is investigated.The proposed circuit dissipates up to 52% less power than FLS, as depicted inFig. 6.5, by employing the by-pass path where the same supply voltage is appliedto both interfaced circuits (scenarios D, E). In contrast, an overhead (average6.8%) exists when the level-up shifters are employed (scenarios A, B, C) due tothe added transmission gates. Likewise, this power overhead drops, on average(scenarios A, B, C), to 5% where more cells are by-passed, such as the FLS in serieswith isolation cells (see Table 6.5). In this situation, the power improvements aregreater (up to 58.2%), where the same voltage is applied to both blocks (scenariosD, E). Moreover, at scenarios A, B, C, where level-down shifters are utilized, thepower overhead from the proposed circuit slightly increases to 9.1%, on average.This increase is due to the fact that level-down shifters consume less power than


FLS, thus, the impact of the proposed circuit is larger. However, the powerimprovements are noticeable (46% on average) for the cases where the level-downshifters are by-passed (scenarios D, E).


0

10

20

30

40

50

60

70PCFLS


By-passpath

Figure 6.5: Power dissipation of the proposed by-pass circuit compared to tradi-tional level-up shifters.

Table 6.5: Power dissipation [µW ] of the proposed by-pass circuit compared todifferent combinations of by-passed cells.

Scenarios FLS + Isolation Cell LSDN LSDN + Isolation cellw/o PC with PC w/o PC with PC w/o PC with PC

A 59.0 62.0 18.2 21.5 24.3 26.8B 66.0 69.0 24.9 27.0 34.7 37.6C 47.0 50.0 27.5 30.7 36.1 38.9D 34.8 14.5 26.1 14.2 17.9 14.2E 50.5 21.1 38.5 20.7 26.4 20.7

In Fig. 6.3, the delay of the interface is depicted, which is high for scenarios A,B, and C. Alternatively, in Fig. 6.5 the power of the interface circuit is depictedrather than the power of the whole system. Power savings in the scenarios A, B,

6.3. RESULTS 127

and C are enabled by reducing the voltage inside the voltage domain (thousandsto millions gates) rather than saving power from the additional interface circuit.In addition, as level-up shifters are connected to both Vdd networks (see Fig. 6.1),power estimation for the whole system (where actually the scenarios A, B, and Crelates to) can not be drawn. Therefore, the scenarios A, B, and C, representslow-power modes for the whole system, where however the power of the interfaceis increased as level-up shifters are connected to both Vdd.

Moreover, the utilization of the proposed circuit in a design can providesignificant power savings at the system level. MVS interface circuits arise timingissues at the interface between the core and L1 cache in high performance modes,thereby prohibiting the application of this technique to these blocks. Hence,to avoid the interface circuitry at these modes, both core and L1 caches aresupplied by the same voltage, which is constrained by the cache operating voltage.In addition, to avoid a high error probability during read/write operations incaches, the voltage reduction of caches can be limited within 5% of the nominalvoltage [145], [148]. However, as the proposed by-pass circuit effectively mitigatesthe timing bottlenecks at specific operating modes (e.g. high performance),different voltages can be assigned to the core and L1 caches to produce greaterpower savings. To demonstrate these additional power savings, where MVS isemployed alongside the proposed by-pass circuit, a first-order analysis of thefollowing case study is considered based on [82], [145], [147], [148]-[150].

A typical breakdown for the power consumed by core (Pcore) and L1 (PL1)is 86% and 14%, respectively, of the total CPU power (PCPU ) [149]. Modernmobile systems reduce power by turning off (power-gating) the CPUs at idlemode. However, power is mainly reduced by dynamic frequency scaling when theCPU is turned on (low, nominal, and high performance modes). Considering thefundamental expressions of dynamic power and latency (P = CV 2f and τ = CV

I ),the power savings by employing MVS and the PC, where the CPU is not idle, canbe projected as follows.

Assuming the high performance mode constitutes 12% of the total operatingtime of a modern CPU [150], a slight overhead (0.7%) of the by-pass circuit existsas compared to no interface circuit. To maintain performance, considering a(first-order) linear dependence of voltage on latency, a negligible 1% increase in thePCPU is estimated. Moreover, the CPU which spends 28% of the total operatingtime at nominal performance mode [150], can benefit by the proposed circuit.


Considering that the nominal frequency is constrained to avoid an excessiveincrease in power, where MVS is not used, a 10% reduction Vcore [147] withoutperformance loss, results in 19% decrease in Pcore where MVS alongside PC areemployed.

This behavior leads to 17% decrease in PCPU . In the low performance mode,which constitutes 60% of the total operating time of a modern CPU [150], withoutMVS cells, a typical 5% [148] reduction from the nominal voltage is applied tothe CPU (Vcore = VL1 = 0.95Vdd) and the frequency is reduced by 50% to savepower [150]. In contrast, by utilizing the proposed circuit and different powersupplies for core and cache, the operating voltage of the core can be reduced up to30% (Vcore = 0.70Vdd and VL1 = 0.95Vdd) [147] and the frequency is again reducedby 50%. In this situation, Pcore is reduced from 86% to 42%, resulting in 46%decrease in PCPU as compared to the power without MVS. If the frequency isreduced less aggressively (30%), the PCPU is reduced by 22%.

Even though thermal analysis at system level was out of the scope of thiswork some fundamental insight is offered in the following sentences. The powersavings reported in the previous paragraphs at system level due to the utilizationof the proposed circuit can help mitigate thermal issues of CPUs. This is due thefact that by utilizing the by-pass circuit, the operating voltage can be reducedseparately for cores and L1 caches. This situation leads to reduced power therebydecreased temperature for the core. Therefore, potential hotspots due to powerintensive cores can be eliminated. This is also very useful for 3-D circuits whereheat issues arrises. Splitting the CPU in two tiers by placing the core into onetier and the L1 caches to the other tier, can potentially create heat problems asL1 cache will cover the core part. Alternatively, by using the proposed circuit thevoltage of the core can be reduced independently to L1 cache without any timingpenalties, thereby alleviating these heat issues.

6.3.3 Area Anaylisis

In this subsection, the area overhead of the proposed circuit is investigated. Theproposed circuit poses an area overhead of 55%, on average, as compared to thetraditional level-up/down shifters (see Table 6.6). In the case where more cells,such as isolation cells alongside level shifters, are bypassed the area overhead ofthe proposed circuit drops to 41%, on average.


Table 6.6: Area [µm2] comparison of the proposed by-pass circuit compared todifferent combinations of by-passed cells.

AdditionalCells

w/oPC

withPC Difference [%]

FLS 8.64 13.1 51LSDN 7.11 11.5 62

FLS + ISO 11.4 15.8 38LSDN + ISO 9.9 14.3 45

Although, this overhead is due to the added transmission gates of the proposedcircuit, the total increase in the area of the circuit blocks using this circuit isnegligible as the blocks typical include hundreds of thousands of cells. In addition,the proposed circuit can be employed selectively to interfaces where differentblocks have the same voltage in specific operating conditions. The overhead inarea for circuits, where the MVS technique is employed, is listed in Table 6.7. Fora small circuit, such as Ind 2, the area increases up to 3.5% when our proposedcircuit is utilized as compared to the circuit where traditional MVS cells areemployed. On the other hand, for a bigger circuit (Ind 1), the proposed circuitposes 1.9% overhead in area to the circuit which utilizes traditional interface cells.

Table 6.7: Overhead in area of additional cells used for the MVS technique onindustrial circuits. Results are normalized to the total area of the design withoutthese additional cells.

Design w/o MVS with MVSw/o PC with PC

Ind1 1 1.05 1.07Ind2 1 1.12 1.18

6.4 Conclusions

In this chapter, my published work [15] of a by-pass circuit for multi-voltagescaling systems is presented. The key idea is that in a multi-voltage scalingenvironment different blocks have the same voltage in specific operating conditions.Consequently, the interface circuits can be detoured to avoid performance and


power losses where high speed operation is required. The proposed by-passingcircuit is oblivious to the type of the interface employed, thus being applicableto all types of interface circuits, such as isolation cells and retention registersin addition to voltage shifters. This situation is very useful for 3-D circuits, asconsidering that tiers in 3-D circuits can operate at different operating voltages,the proposed circuit can be utilized to by-pass an physical interface between tiers.

This circuit is compared with the traditional level-up and level-down shifters,where speed is enhanced by up to 89% and power consumption is decreased up to52% where the interfaced blocks operate at the same supply voltage. In the casewhere the proposed circuit is utilized in critical paths of industrial circuits, theincrease in latency is negligible (0.7% and 2.1%) as compared to the latency ofthe paths where no level conversion circuit is employed. Therefore, the traditionaltiming bottlenecks at the block interfaces are appeased in high performanceconditions for MVS systems. Moreover, the proposed circuit allows MVS to beapplied to smaller blocks (e.g., core and L1 caches) of a system. Consequently, thesupply voltage is reduced more aggressively, leading to an up to 46% decrease inthe dynamic power of a modern CPU for mobile systems. Finally, this by-passingcircuit poses an overhead in area of 55%, on average, as compared to the levelconversion circuitry from MVS. However, this circuit can be employed selectivelyto specific interfaces of a system, thus resulting in a negligible overhead in area atthe system level.

Chapter 7

Conclusions and Future Work

Vertical integration technologies, such as interposers and TSV based three-dimensional integration, have emerged to support denser and more efficient,in terms of performance and power, integrated circuits. Different design aspectsof these technologies are investigated in my thesis, while extending the state-of-the-art. The conclusions related to my research are presented in this last chapter.A summary of the outcomes and my contributions to this evolving research areais described in Section 7.1. Some future research topics are offered in Section 7.2.

7.1 Summary of the thesis

In this thesis, emphasis is placed on the effects of vertical integration technologies tothe circuit design process. New methodologies and tools are proposed to determineand improve traditional objectives in circuit design, such as performance, area,and, particularly, power for vertically integrated circuits. The main contributionsand outcomes of my research in this area are discussed in the following paragraphs.

Interconnect Design for Interposers

Interposer technology, based on either silicon or glass substrates, offers highdensity, high performance interconnects for integrated systems resulting in smallerform factors and improved system performance as compared to traditional pack-ages. The most wire-dominant part of an interposer is the redistribution layerwhich is utilized to interconnect the hosted dice. Therefore, an important task in

131

132 CHAPTER 7. CONCLUSIONS AND FUTURE WORK

the design process for these systems is to evaluate the effect of different interposermaterials on wires in the RDL for different design objectives.

My published work [12], discussed in Chapter 3, sheds light on the different de-sign tradeoffs for interconnects in the RDL due to the different material propertiesof silicon and glass interposers. Design guidelines for interconnects on glass andsilicon interposers are determined for a 65 nm process node, that satisfy power,delay, area, and crosstalk constraints. Interconnects on glass interposers are shownto be a superior alternative to silicon interposers in terms of power and latency.On the contrary, interconnects on glass are more prone to crosstalk effects than onsilicon. This situation requires a different treatment for sizing the interconnectsas there is a tradeoff between area and noise for the glass interposers. Further-more, the minimum pitch does not result to minimum power, delay, and crosstalk.Finally, increasing the wire width on a silicon interposer leads to higher powerconsumption than on glass for the same width. Consequently, glass interposersare a better solution for low power systems under the same latency constraints.

Design Flow for TSV Based 3-D ICs

For the past few years, several methods and techniques that address specificsteps of the backend design process have been developed for three-dimensionalcircuits with TSVs. Integrating all of these tools, however, in a design flow isnot a straightforward task. Consequently, there is not a complete design flow tosupport the design of TSV based 3-D ICs. Furthermore, considering the diverseTSV and bonding technologies, the lack of a complete design flow limits the designexploration of 3-D circuits.

To address these issues, a novel backend design automation flow is presentedin Chapter 4, based on my published work [13], which enables design spaceexploration for TSV based 3-D ICs. The design experience of using this flow issimilar to a 2-D flow, as commercial 2-D EDA tools and a public academic 3-D toolare utilized in all of the stages. New steps are added to support the introductionof the third dimension and the broad gamut of TSV technologies and bondingstyles. A crucial step of the flow is the merging of the SPEF files of each tier intoa global SPEF file in order to include the impedance of the vertical interconnects.This feature of the flow enables the performance of the 3-D circuit to be evaluatedby seamlessly performing STA with mature EDA tools instead of considering the

7.1. SUMMARY OF THE THESIS 133

longest inter-tier delay and wirelength prediction models. Furthermore, this is thefirst design flow for 3-D ICs with TSVs, which is compatible with multi-mode poweranalysis while considering the electrical characteristics of vertical interconnects.Application of the flow to different benchmark circuits shows that even with nooptimization effort, a two tier 3-D stack produced by the flow achieves up to 14.6%average power reduction, 18.7% performance improvement, and 49% footprintreduction as compared to the 2-D design for a specific circuit.

Voltage Scaling in 3-D ICs

The traditional notion of decreasing power in 3-D ICs is based on the reductionof the interconnect capacitance due to the wirelength decrease. However, thesesavings in power are not adequate, in particular, if TSVs exhibit non-negligibleparasitic capacitance. Therefore, based on this observation, my work in Chapter 5from [14] follows a different yet efficient way to decrease power by combining theinnate traits of 3-D integration with standard low power methods for integratedcircuits, such as voltage scaling. The objective of my approach is to exploit theadditional slack on critical paths originating from the introduction of the thirddimension to enhance power savings by reducing the supply voltage.

However, depending on the characteristics of the paths within a circuit, thepower savings from applying this approach to 3-D ICs can greatly vary. Conse-quently, the critical paths of a design should be carefully considered to evaluatewhere voltage reduction does not degrade the target performance of the system.Therefore, an enhanced timing model for circuit paths based on logical effortis presented to address the tradeoff between voltage and interconnect lengthreduction. In addition, guidelines are offered to identify early in the design processif 2-D circuits can benefit from voltage reduction with vertical integration. Amethodology for applying and evaluating voltage reduction in 3-D ICs by utilizingEDA tools is also presented. The traditional notion where the power is reduceddue to the decrease in the wire capacitance of a 3-D IC leads to moderate powersavings for several benchmark circuits (6.6% on average). Alternatively, theproposed approach results in power reduction of 22.3% on average. A decreaseof 27% in the peak power is observed for a specific case study as compared tothe 3-D case where voltage supply is not scaled and the same speed as in twodimensions is maintained.


Interfacing Domains with Different Voltages

The main limitation of employing voltage scaling and power gating techniquesat finer circuit granularities is the penalty in delay due to the additional cellsat the boundaries of voltage domains. As a result, the full potential of thesetechniques in terms of power savings is not fully exploited due to the high latencyof these interfacing cells. In addition, this situation is common to both 2-D and3-D circuits and can potentially restrict the significant power gains from voltagescaling. To alleviate this timing penalty, an advanced circuit interface is presentedin Chapter 6 from my published work [15] for by-passing these cells.

The key idea is that in a multi-voltage scaling environment, different blockshave the same voltage in specific operating conditions. Consequently, the interfacecircuits can be detoured to avoid performance and power losses where high speedoperation is required. The proposed by-passing circuit is oblivious to the typeof the interface employed, thus being applicable to all types of interface circuits.Results demonstrate that speed is enhanced by up to 89% and power consumptionis decreased up to 52% where the interfaced blocks operate at the same supplyvoltage and the by-pass path is enabled as compared to traditional level shifters. Inthe case where the proposed circuit is utilized in critical paths of industrial circuitsthe latency increase is negligible (0.7% and 2.1%) as compared to the latency of thepaths where no level conversion circuit is employed. This behavior demonstratesthat traditional timing bottlenecks at the block interfaces are appeased in highperformance conditions for MVS systems by employing the proposed circuit.Therefore, voltage scaling can be employed at finer circuit granularities providingsignificant power savings with a negligible overhead in area at the system level.

7.2 Future Research Topics

There are several interesting directions to follow in designing vertically integratedcircuits. A few of these directions are discussed in the following paragraphs.

Interconnect Issues in Systems with Interposers and 3-DCircuits

State-of-the-art and future systems are envisioned to combine 3-D circuits

7.2. FUTURE RESEARCH TOPICS 135

on interposer technologies. The interconnect structures in these systems canbe complex and pose many design challenges as timing paths traverse verticallymultiple dice and connect to other dice through the interposer. These pathsutilize different process nodes (metal layers) alongside diverse TSV technologies.Therefore, investigating the timing closure of these paths is an important task.Moreover, considering the different interposer materials, research effort should beplaced on the design of the circuitry that drives these wires. I/O drivers shouldprovide enough current to drive these interconnect structures while consideringpower, performance, and area constraints. Finally, the power supply noise due tothe resistive and inductive behavior of the entire interconnection system shouldbe carefully analyzed.

Optimization Steps in Design Flows for 3-D ICs

Providing a backend design was an important first step to enable the designexploration for TSV based three-dimensional circuits. This flow can be extendedto support several optimization steps for 3-C ICs. Time driven synthesis and P&Ris a challenging task for circuits in three dimensions. This situation is mainlydue the difficulty of propagating timing constraints between tiers by utilizing2-D EDA tools. Therefore, new tools and methodologies should be implementedto support this feature. In addition, this feature can enable the utilization ofadvanced low power techniques, such as multi-voltage threshold and multi-lengthcells, to improve the performance and/or the power of the 3-D circuit, as thesetechniques require time driven synthesis and P&R. Moreover, different typesof clock distribution networks (CDNs) can be used in 3-D ICs, based on thenumber of TSVs employed to propagate the clock between adjacent tiers, suchas single-TSV, multi-TSV, and fault-tolerant clock TSV. Therefore, a clock treesynthesis step which supports these types of CDNs, should be implemented inorder to investigate and evaluate the power, performance, and area of CDNs inthree-dimensional circuits.

Temperature Aware Voltage Scaling in 3-D ICs

The necessary means to identify voltage scaling opportunities and apply thistechnique to 3-D ICs with TSV are provided in this thesis. Results show thatthis approach can lead to significant power gains as compared to the traditional


notion of reducing power due to the wirelength reduction. This power reductioncan lead to decreased power densities and thus alleviate thermal issues for 3-DICs. Therefore, this work can be extended to include either a thermal objective orthermal constraints to ensure that the temperature is carefully managed within the3-D stack. Moreover, the timing model proposed in this work can be extended tosupport temperature variations. This extended model can be used to explore pathsin 3-D circuits while considering voltage, wirelength, and temperature variations.

Bibliography

[1] S. Borkar, “Design Perspectives on 22 nm CMOS and beyond,” Proceedingsof the ACM/IEEE Design Automation Conference, pp. 93-94, July 2009.

[2] V. F. Pavlidis, I. Savidis, and E. G. Friedman, Three-Dimensional IntegratedCircuit Design, 2nd Edition, Morgan Kaufmann Publishers, 2017.

[3] G. E. Moore, “Cramming More Components Onto Integrated Circuits,”Electronics, Vol. 38, No 8, pp. 114, April 1965.

[4] A. W. Topol et al., “Three-dimensional Integrated Circuits,” IBM Journalof Research and Development, Vol. 50, No. 4/5, pp. 491-506, July-September2006.

[5] R. Ho, K. W. Mai and M. A. Horowitz, “The Future of Wires,” Proceedingsof the IEEE, Vol. 89, No. 4, pp. 490-504, April 2001.

[6] K. Banerjee et al., “3-D ICs: A Novel Chip Design for Improving Deep-Submicrometer Interconnect Performance and Systems-On-Chip Integration,”Proceedings of the IEEE, Vol. 89, No. 5, pp. 602-633, May 2001.

[7] N. Magen et al., “Interconnect-Power Dissipation in a Microprocessor,” Pro-ceedings of the ACM International Workshop on System Level InterconnectPrediction, pp. 7-13, February 2004.

[8] Y. Tsukada, S. Tsuchida, and Y. Mashimoto, “Surface Laminar CircuitPackaging,” Proceedings of the Electronic Components and Technology Con-ference, pp. 22-27, May 1992.

[9] B. Banijamali et al., “Ceramics vs. Low-CTE Organic Packaging of TSV Sil-icon Interposers,” Proceedings of the Electronic Components and TechnologyConference, pp. 573-576, June 2011.

137

138 BIBLIOGRAPHY

[10] G. Kumar et al., “Ultra-High I/O Density Glass/Silicon Interposers forHigh Bandwidth Smart Mobile Applications,” Proceedings of the ElectronicComponents and Technology Conference, pp. 217-223, June 2011.

[11] P. Garrou, M. Koyanagi, and P. Ramm, Handbook of 3D Integration; 3DProcess Technology, Vol. 3, William Andrew Publishing, 2014.

[12] H. Kalargaris and V. F. Pavlidis, “Interconnect Design Tradeoffs for Siliconand Glass Interposers,” Proceedings of the IEEE International New Circuitsand Systems Conference, pp. 77-80, June 2014.

[13] H. Kalargaris, Yi-Chung Chen, and V. F. Pavlidis, “STA Compatible Back-end Design Flow for TSV-based 3-D ICs,” Proceedings of the IEEE In-ternational Symposium on Quality Electronic Design, pp. 186-191, March2017.

[14] H. Kalargaris and V. F. Pavlidis, “Voltage Scaling for 3-D ICs: When, how,and how much?” Microelectronics Journal, Vol. 69, pp. 35-44, November2017.

[15] H. Kalargaris, J. Goodacre, and V. F. Pavlidis, “Advanced Circuit Interfacefor Systems with Multiple Voltage Domains,” Proceedings of the IEEEInternational Ph.D. Research in Microelectronics and Electronics Conference,June 2016.

[16] International Technology Roadmap for Semiconductors ITRS, 2010 Edition.

[17] M. Swaminathan and K. J. Han, Design and Modeling for 3D ICs andInterposers, World Scientific Publishers, 2013.

[18] Xilinx Virtex-7 FPGA Family, http://www.xilinx.com/, 2013.

[19] AMD Radeon Rx 300 Fiji and Vega GPU series, http://www.amd.com/,2015.

[20] NVIDIA Pascal GP100 and Volta GPU, http://www.nvidia.com/, 2016.

[21] M. J. Wang et al., “TSV Technology for 2.5D IC Solution,” Proceedings ofthe IEEE Electronic Components and Technology Conference, pp. 284-288,June 2012.

BIBLIOGRAPHY 139

[22] J. H. Lau, “The Future of Interposers for Semiconductor IC Packaging,”Chip Scale Review Magazine, pp. 32-36, February 2014.

[23] W. Kwon et al., “Enabling a Manufacturable 3D Technologies and EcosystemUsing 28nm FPGA with Stack Silicon Interconnect Technology,” Proceedingsof the Symposium on Microelectronics, pp. 217-222, October 2013.

[24] Q. Cui et al., “Design and Optimization of Redistribution Layer (RDL)on TSV Interposer for High Frequency Applications,” Proceedings of theInternational Conference on Electronic Packaging Technology and HighDensity Packaging, pp. 1-5, August 2011.

[25] M. A. Karim, P. D. Franzon, and A. Kumar, “Power Comparison of 2D, 3Dand 2.5D Interconnect Solutions and Power Optimization of Interposer Inter-connects,” Proceedings of the IEEE Electronic Components and TechnologyConference, pp. 860-866, May 2013.

[26] V. Sukumaran et al., “Through-Package-Via Formation and Metallizationof Glass Interposers,” Proceedings of the IEEE Electronic Components andTechnology Conference, pp. 557-563, June 2010.

[27] R. R. Tummala, “2.5D Interposers; Organics vs. Silicon vs. Glass,” ChipScale Review Magazine, Vol. 17, No 4, pp. 18-19, July-August 2013.

[28] S. Cho et al., “Impact of Copper Through-Package Vias on Thermal Perfor-mance of Glass Interposers,” IEEE Transactions on Components, Packagingand Manufacturing Technology, Vol. 5, No. 8, pp. 1075-1084, August 2015.

[29] M. Lee et al., “Noise Coupling of Through-Via in Silicon and Glass Inter-poser,” Proceedings of the IEEE Electronic Components and TechnologyConference pp. 1806-1810, May 2013.

[30] B. McClean, “Semiconductor Market and Packaging Trends,” Chip ScaleReview Magazine, pp. 5-6, February 2014.

[31] C. -K. Koh and P. H. Madden, “Manhattan or non-Manhattan?: A Studyof Alternative VLSI Routing Architectures,” Proceedings of the Great Lakessymposium on VLSI, pp. 47-52, March 2000.

140 BIBLIOGRAPHY

[32] Chen, Hongyu et al., “Estimation of Wirelength Reduction for λ-Geometryvs. Manhattan Placement and Routing.” Proceedings of the ACM Interna-tional Workshop on System Level Interconnect Prediction, pp. 71-76, April2003.

[33] J. W. Joyner et al., “A Three-Dimensional Stochastic Wire-Length Distribu-tion for Variable Separation of Strata,” Proceedings of the IEEE InternationalInterconnect Technology Conference, pp. 126-128, June 2000.

[34] J. F. Gibbons and K. F. Lee, “One-Gate-Wide CMOS Inverter on Laser-Recrystallized Polysilicon,” IEEE Transactions on Electron Device Letters,Vol. 1, No. 6, pp. 117-118, June 1980.

[35] S. Akiyama et al., “Multilayer CMOS Device Fabricated on Laser Recrys-tallized Silicon Islands,” Proceedings of the IEEE International ElectronDevices Meeting, pp. 352-355, December 1983.

[36] M. W. Geis et al., “Crystalline Silicon on Insulators by Graphoepitaxy,”Proceedings of the IEEE International Electron Devices Meeting, pp. 210-212,December 1979.

[37] B. Yu et al., “FinFet Scaling to 10 nm Gate Length,” Proceedings of theIEEE International Electron Devices Meeting, pp. 251-254, December 2002.

[38] X. Wu et al., “A Three-Dimensional Stacked Fin-CMOS Technology forHigh-Density ULSI Circuits,” IEEE Transactions on Electron Devices, Vol.52, No. 9, pp. 1998-2003, September 2005.

[39] X. Wu et al., “Stacked 3-D Fin-CMOS Technology,” IEEE Electron DeviceLetters, Vol. 26, No. 6, pp. 416-418, June 2005.

[40] E. Culurciello and A. G. Andreou, “Capacitive Inter-Chip Data and PowerTransfer for 3-D VLSI,” IEEE Transactions on Circuits and Systems II:Express Briefs, Vol. 53, No. 12, pp. 1348-1352, December 2006.

[41] N. Miura et al., “A 1 TB/s 1 pJ/b 6.4 mm2/TB/s QDR Inductive-CouplingInterface Between 65-nm CMOS Logic and Emulated 100-nm DRAM,” IEEETransactions on Emerging and Selected Topics in Circuits and Systems, Vol.2, No. 2, pp. 249-256, June 2012.

BIBLIOGRAPHY 141

[42] N. Miura et al., “Crosstalk Countermeasures for High-Density Inductive-Coupling Channel Array,” IEEE Transactions of Solid-State Circuits, Vol.42, No. 2, pp. 410-421, February 2007.

[43] M. Tilli et al., Handbook of Silicon Based MEMS Materials and Technologies(Second Edition), William Andrew Publishing, 2015.

[44] X. Jing et al., “Via Last TSV Process for Wafer Level Packaging,” Pro-ceedings of the IEEE International Conference on Electronic PackagingTechnology, pp. 1216-1218, August 2016.

[45] G. Beanato et al., “Impact of Data Serialization over TSVs on RoutingCongestion in 3D-Stacked Multi-Core Processors,” Microelectronics Journal,Vol. 51, pp. 38-45, March 2016.

[46] S. Wang, “Barriers Against Copper Diffusion into Silicon and Drift ThroughSilicon Dioxide,” MRS Bulletin, Vol. 19, No. 8, pp. 30-40, August 1994.

[47] J. M. Chan et al., “Reliability Evaluation of Copper (Cu) Through-SiliconVia (TSV) Barrier and Dielectric Liner by Electrical Characterization,”Proceedings of the IEEE Electronics Packaging Technology Conference, pp.478-482, November 2016.

[48] G. Katti et al., “Electrical Modeling and Characterization of Through Siliconvia for Three-Dimensional ICs,” IEEE Transactions on Electron Devices,Vol. 57, No. 1, pp. 256-262, January 2010.

[49] International Technology Roadmap for Semiconductors ITRS, 2011.

[50] D. Henry et al., “Low Electrical Resistance Silicon Through Vias: Technologyand Characterization,” Proceedings of the IEEE International ElectronicComponents and Technology Conference, pp. 1360-1365, June 2006.

[51] R. S. Patti, “Three-Dimensional Integrated Circuits and the Future ofSystem-on-Chip Designs,” Proceedings of the IEEE, Vol. 94, No. 6, pp.1214-1224, June 2006.

[52] H. -M Tong, Y. -S. Lai, and C. P. Wong, Advanced Flip Chip Packaging,Springer US, 2013.

142 BIBLIOGRAPHY

[53] C. S. Tan, R. J. Gutmann, and L. R. Reif, Wafer Level 3-D ICs ProcessTechnology, Springer US, 2008.

[54] B. Vaidyanathan et al., “Architecting Microprocessor Components in 3-DDesign Space,” Proceedings of the IEEE International Conference on VLSIDesign, pp. 103-108, January 2007.

[55] B. Black et al., “Die Stacking (3D) Microarchitecture,” Proceedings of theIEEE/ACM International Symposium on Microarchitecture, pp. 469-479,December 2006.

[56] B. Black, D. W. Nelson, C. Webb, and N. Samra, “3D Processing Tech-nology and its Impact on iA32 Microprocessors,” Proceedings of the IEEEInternational Conference on Computer Design: VLSI in Computers andProcessors, pp. 316-318, October 2004.

[57] Y. Xie, G. H. Loh, B. Black, and K. Bernstein, “Design Space Exploration for3D Architectures,” ACM Journal on Emerging Technologies in ComputingSystems, Vol. 2, No. 2, pp. 65-103, April 2006.

[58] K. Puttaswamy and G. H. Loh, “3D-Integrated SRAM Components forHigh-Performance Microprocessors,” IEEE Transactions on Computers, Vol.58, No. 10, pp. 1369-1381, October 2009.

[59] Y.-F. Tsai et al., “Design Space Exploration for 3-D Cache,” IEEE Transac-tions on Very Large Scale Integration (VLSI) Systems, Vol. 16, No. 4, pp.444-455, April 2008.

[60] G. H. Loh, Y. Xie, and B. Black, “Processor Design in 3D Die-StackingTechnologies,” IEEE Micro, Vol. 27, No. 3, pp. 31-48, May 2007.

[61] H. W. Dong et al., “An Optimized 3D-Stacked Memory Architecture byExploiting Excessive, High-Density TSV Bandwidth,” Proceedings of theIEEE International Symposium on High Performance Computer Architecture,pp. 1-12, January 2010.

[62] S. Palacharla, N. P. Jouppi, and J. E. Smith, “Complexity-Effective Super-scalar Processors,” Proceedings of the IEEE International Conference onComputer Architecture, pp. 206-218, June 1997.

BIBLIOGRAPHY 143

[63] R. P. Brent and H. T. Kung, “A Regular Layout for Parallel Adders,” IEEETransactions on Computers, Vol. C-31, No. 3, pp. 260-264, March 1982.

[64] P. M. Kogge and H. S. Stone, “A Parallel Algorithm for the EfficientSolution of a General Class of Recurrence Equations,” IEEE Transactionson Computers, Vol. C-22, No. 8, pp. 786-793, August 1973.

[65] S. S. Mukherjee et al., “The Alpha 21364 Network Architecture,” IEEEMicro, Vol. 22, No. 2, pp. 26-35, January/February 2002.

[66] K. Zhang et al., “A SRAM Design on 65 nm CMOS Technology with Inte-grated Leakage Reduction Scheme,” Proceedings of the IEEE InternationalSymposium on VLSI Circuits, pp. 294-295, June 2004.

[67] C. H. Sequin, “Managing VLSI Complexity: An Outlook,” Proceedings ofthe IEEE, Vol. 71, No. 1, pp. 149-166, January 1983.

[68] M. Pathak et al., “Through-Silicon-Via Management During 3D PhysicalDesign: When to Add and How Many?,” Proceedings of the InternationalConference on Computer-Aided Design, pp. 387-394, November 2010.

[69] H. Reiter and S. Bansal, “EDA Tools for 3D IC Design: Simple Migrationof 2D Tools or New Paradigm?,” Chip Design Magazine, pp. 30-33, March2011.

[70] H. Lee and K. Chakrabarty, “Test Challenges for 3D Integrated Circuits,”IEEE Transactions on Design & Test of Computers, Vol. 26, No. 5, pp.26-35, September 2009.

[71] R. Patti and Tezzaron Semiconductor, “Impact of Wafer-Level 3D Stackingon the Yield of ICs,” Future Fab International, Vol. 23, September 2007.

[72] N. S. Kim et al., “Leakage Current: Moore’s Law Meets Static Power,”Computer, Vol. 36, No. 12, pp. 68-75, December 2003.

[73] Kleiner Perkins Caufield & Byers Venture Capital, Internet Trends, 2015.

[74] D. H. Kim, S. Mukhopadhyay, and S. K. Lim, “TSV-Aware InterconnectDistribution Models for Prediction of Delay and Power Consumption of 3-DStacked ICs,” IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems, Vol. 33, No. 9, pp. 1384-1395, September 2014.

144 BIBLIOGRAPHY

[75] S. -H. Ok et al., “The Impact of 3D Stacking and Technology Scaling onthe Power and Area of Stereo Matching Processors,” Sensors Journal, Vol.17, No. 2, article 426, February 2017.

[76] B. Davari, R. H. Dennard and G. G. Shahidi, “CMOS Scaling for HighPerformance and Low Power - the Next Ten Years,” Proceedings of theIEEE, Vol. 83, No. 4, pp. 595-606, April 1995.

[77] K. Brock and SYNOPSYS® , “Optimizing CPUs, GPUs and DSPs for HighPerformance and Low Power at 28 nm,” http://www.synopsys.com/, 2013.

[78] A. Chandrakasan, W. Bowhill, and F. Fox, Design of High-PerformanceMicroprocessor Circuits, Wiley-IEEE Press, 2001.

[79] R. Wilson and D. Lammers, “Grove Calls Leakage Chip Designers’ TopProblem,” EE Times, December 2013.

[80] W. Chedid, Y. Chansu, and B. Lee, “Power Analysis and OptimizationTechniques for Energy Efficient Computer Systems,” Advances in Computers,Vol. 63, pp. 129-164, April 2005.

[81] T. Mudge, “Power: a First-Class Architectural Design Constraint,” Com-puter, Vol. 34, No. 4, pp. 52-58, April 2001.

[82] M. Keating et al., Low Power Methodology Manual: For System-On-ChipDesign, Springer, 2007.

[83] O. Coudert, “Gate Sizing for Constrained Delay/Power/Area Optimization,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 5,No. 4, pp. 465-472, December 1997.

[84] W. Qing, M. Pedram, and W. Xunwei, “Clock-Gating and its Application toLow Power Design of Sequential Circuits,” IEEE Transactions on Circuitsand Systems I: Fundamental Theory and Applications, Vol. 47, No. 3, pp.415-420, March 2000.

[85] J. Hailin, M. Marek-Sadowska, and S. R. Nassif, “Benefits and Costs ofPower-Gating Technique,” Proceedings of the International Conference onComputer Design, pp. 559-566, October 2005.

BIBLIOGRAPHY 145

[86] A. Das et al., “Evaluating Voltage Islands in CMPs under Process Variations,”Proceedings of the International Conference on Computer Design, pp. 129-136, October 2007.

[87] M. Anis, S. Areibi, and M. Elmasry, “Design and Optimization of Multi-threshold CMOS (MTCMOS) Circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 22, No. 10, pp. 1324-1342, October 2003.

[88] T. Kuroda et al., “Variable Supply-Voltage Scheme for Low-Power High-Speed CMOS Digital Design,” IEEE Transactions on Solid-State Circuits,Vol. 33, No. 3, pp. 454-462, March 1998.

[89] D. E. Lackey et al., “Managing Power and Performance for System-On-ChipDesigns using Voltage Islands,” Proceedings of the IEEE/ACM InternationalConference on Computer-Aided Design, pp. 195-202, November 2002.

[90] B. Amelifard and M. Pedram, “Optimal Design of the Power-DeliveryNetwork for Multiple Voltage-Island System-on-Chips,” IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems, Vol. 28, No.6, pp. 888-900, June 2009.

[91] W. L. Hung et al., “Temperature-Aware Voltage Islands Architecting inSystem-on-Chip Design,” Proceedings of the IEEE International Conferenceon Computer Design, pp. 689-694, October 2005.

[92] N. Xu et al., “Thermal-Aware Post Layout Voltage-Island Generation for3D ICs,” Journal of Computer Science and Technology, Vol. 28, No. 4, pp.671-681, July 2013.

[93] R. R. Tummala et al., “Trend from ICs to 3D ICs to 3D systems,” Proceedingsof the IEEE Custom Integrated Circuits Conference, pp. 439-444, September2009.

[94] ISSI’s 2 GB DDR2 SDRAM, http://www.issi.com/.

[95] Dual-core ADSP-BF606 Blackfin processor, http://www.analog.com/.

[96] Predictive Technology Model (PTM) website, http://ptm.asu.edu/.

146 BIBLIOGRAPHY

[97] J. D. Mackenzie and V. Pavate, “RF and/or RF Identification Tag/DeviceHaving an Integrated Interposer, and Methods for Making and Using theSame,” Patent application number: 20120176226, Publication date: 2012-07-12.

[98] S. Wong, G. Lee, and D. Ma, “Modeling of Interconnect Capacitance, Delay,and Crosstalk in VLSI,” IEEE Transactions on Semiconductor Manufactur-ing, Vol. 13, No. 1, pp. 108-111, February 2000.

[99] Synopsys HSPICE®Version G-2012.06-SP2.

[100] K. Agarwal, D. Sylvester, and D. Blaauw, “Modeling and Analysis ofCrosstalk Noise in Coupled RLC Interconnects,” IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems, Vol. 25, No. 5,pp. 892-901, May 2006.

[101] M. Elgamel and M. A. Bayoumi, “Interconnect Noise Analysis and Optimiza-tion in Deep Submicron Technology,” IEEE Circuits and Systems Magazine,Vol. 3, No. 4, pp. 6-17, Fourth Quarter 2003.

[102] W. R. Davis et al., “Demystifying 3D ICs: The Pros and Cons of GoingVertical,” IEEE Magazine on Design & Test of Computers, Vol. 22, No. 6,pp. 498-510, November 2005.

[103] D. Doman, Engineering the CMOS Library: Enhancing Digital Design Kitsfor Competitive Silicon, Wiley, 2012.

[104] Cadence® Encounter® Version v13.13-s017 1.

[105] W.-L. Hung et al., “Interconnect and Thermal-Aware Floorplanning for3D Microprocessors,” Proceedings of the IEEE International Symposium onQuality Electronic Design, pp. 6-11, March 2006.

[106] M.-K. Hsu, V. Balabanov, and Y.-W. Chang, “TSV-aware Analytical Place-ment for 3-D IC Designs Based on A Novel Weighted-Average WirelengthModel,” IEEE Transactions on Computer-Aided Design of Integrated Cir-cuits and Systems, Vol. 32, No. 4, pp. 497-509, April 2013.

[107] J. Cong and G. Luo, “A Multilevel Analytical Placement for 3D ICs,” Pro-ceedings of the IEEE Asia and South Pacific Design Automation Conference,pp. 361-366, January 2009.

BIBLIOGRAPHY 147

[108] D. Khalil et al., “Analytical Model for the Propagation Delay of ThroughSilicon Vias,” Proceedings of the IEEE International Symposium on QualityElectronic Design, pp. 553-556, March 2008.

[109] V. F. Pavlidis, I. Savidis, and E. G. Friedman, “Clock Distribution Networksfor 3-D Integrated Circuits,” Proceedings of the IEEE Custom IntegratedCircuits Conference, pp. 651-654, September 2008.

[110] D. Fick et al., “Centip3De: A 3930 DMIPS/W Configurable Near-Threshold3D Stacked System with 64 ARM Cortex-M3 Cores,” Proceedings of theIEEE International Solid-State Circuits Conference, pp. 190-192, February2012.

[111] D. H. Kim et al., “3D-MAPS: 3D Massively Parallel Processor with StackedMemory,” Proceedings of the IEEE International Solid-State Circuits Con-ference, pp. 188-190, February 2012.

[112] S. Priyadarshi et al., “Pathfinder 3D: A Flow for System-Level Design SpaceExploration,” Proceedings of the IEEE International 3D Systems IntegrationConference, pp. 1-8, February 2011.

[113] S. Panth et al., “High-Density Integration of Functional Modules UsingMonolithic 3D-IC Technology,” Proceedings of the IEEE Asia and SouthPacific Design Automation Conference, pp. 681-686, January 2013.

[114] Synopsys® Design Compiler® Version H-2013.03-SP5-2.

[115] X. Zhao and S. K. Lim, “Power and Slew-Aware Clock Network Design forThrough-Silicon-Via (TSV) Based 3D ICs,” Proceedings of the IEEE Asiaand South Pacific Design Automation Conference, pp. 175-180, January2010.

[116] Synopsys® Formality® Version H-2013.03-SP5.

[117] Synopsys® PrimeTime® Version H-2013.06-SP3-3.

[118] S. Ladenheim et al., “IC Thermal Analyzer for Versatile 3-D StructuresUsing Multigrid Preconditioned Krylov Methods,” Proceedings of the IEEEInternational Conference on Computer Aided Design (ICCAD), November2016.

148 BIBLIOGRAPHY

[119] S. Ladenheim, Y.-C. Chen, H. Kalargaris, M. Mihajlovic, and V. F. Pavlidis,“Computationally Efficient Standard-Cell FEM-based Thermal Analysis,”Proceedings of the ACM/IEEE International Conference on Computer-AidedDesign (ICCAD), (accepted), November 2017.

[120] S. Davidson, “ITC’99 Benchmark Circuits-Preliminary Results,” Proceedingsof the IEEE International Test Conference, p. 1125, September 1999.

[121] S. Swaminathan et al., “A Dynamically Reconfigurable Adaptive ViterbiDecoder,” Proceedings of ACM International Symposium on FPGAs, pp.227-236, February 2002.

[122] Opencores, http://www.opencores.org/.

[123] D. H. Kim and S. K. Lim, “Impact of Through-Silicon-Via Scaling on theWirelength Distribution of Current and Future 3D ICs,” Proceedings of theIEEE International Interconnect Technology Conference, pp. 1-3, May 2011.

[124] D. H. Kim and S. K. Lim, “Through-Silicon-Via-Aware Delay and PowerPrediction Model for Buffered Interconnects in 3D ICs,” Proceedings ofACM/IEEE International Workshop on System Level Interconnect Predic-tion, pp. 25-32, June 2010.

[125] D. H. Kim et al., “Design and Analysis of 3D-MAPS (3D Massively ParallelProcessor with Stacked Memory),” IEEE Transactions on Computers, Vol.64, No. 1, pp. 112-125, January 2015.

[126] S.-A Yu, P.-Y. Huang, and Y.-M. Lee, “A Multiple Supply Voltage BasedPower Reduction Method in 3-D ICs Considering Process Variations andThermal Effects,” Proceedings of the IEEE Asia and South Pacific DesignAutomation Conference, pp. 55-60, January 2009.

[127] H. Xu, V. F. Pavlidis, and G. De Micheli, “Analytical Heat Transfer Modelfor Thermal Through-Silicon Vias,” Proceedings of the Conference on Design,Automation, and Test, in Europe, pp. 395-400, March 2011.

[128] A. Todri-Sanial et al., “Globally Constrained Locally Optimized 3-D PowerDelivery Networks,” IEEE Transactions on Very Large Scale Integration(VLSI) Systems, Vol. 22, No. 10, pp. 2131-2144, October 2014.

BIBLIOGRAPHY 149

[129] H. Xu et al., “Timing Uncertainty in 3-D Clock Trees Due to ProcessVariations and Power Supply Noise,” IEEE Transactions on Very LargeScale Integration (VLSI) Systems, Vol. 21, No. 12, pp. 2226-2239, December2013.

[130] C. Zhu et al., “Three-Dimensional Chip-Multiprocessor Run-Time ThermalManagement,” IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems, Vol. 27, No. 8, pp. 1479-1492, August 2008.

[131] S.-H. Whi and Y.-M. Lee, “Supply Voltage Assignment for Power Reductionin 3D ICs Considering Thermal Effect and Level Shifter Budget,” Proceedingsof the IEEE International Symposium on VLSI Design, Automation, andTest, pp. 1-4, April 2011.

[132] Y. Zhan et al., “Module Assignment for Pin-Limited Designs under theStacked-Vdd Paradigm,” Proceedings of the ACM/IEEE International Con-ference on Computer-Aided Design, pp. 656-659, November 2007.

[133] N. Kapadia and S. Pasricha, “A Co-Synthesis Methodology for PowerDelivery and Data Interconnection Networks in 3D ICs,” Proceedings ofthe IEEE International Symposium on Quality Electronic Design, pp. 73-79,March 2013.

[134] N. Kapadia and S. Pasricha, “A Power Delivery Network Aware Frameworkfor Synthesis of 3D Networks-on-Chip with Multiple Voltage Islands,” Pro-ceedings of the IEEE International Conference on VLSI Design, pp. 262-267,January 2012.

[135] K. Chae and S. Mukhopadhyay, “Tier-Adaptive-Voltage-Scaling (TAVS): AMethodology for Post-Silicon Tuning of 3D ICs,” Proceedings of the IEEEAsia and South Pacific Design Automation Conference, pp. 277-282, January2012.

[136] I. Sutherland, R. F. Sproull, and D. Harris, Logical Effort: Designing FastCMOS Circuits, Morgan Kaufmann Publishers, 1999.

[137] A. Morgenshtein et al., “Unified Logical Effort: A Method for Delay Eval-uation and Minimization in Logic Paths With RC Interconnect,” IEEETransactions on Very Large Scale Integration (VLSI) Systems, Vol. 18, No.5, pp. 689-696, May 2010.

150 BIBLIOGRAPHY

[138] M.-H Chang et al., “Logical Effort Models with Voltage and TemperatureExtensions in Super-/Near-/Sub-Threshold Regions,” Proceedings of theIEEE International VLSI Design, Automation, and Test Symposium, pp.1-4, April 2011.

[139] Taiwan Semiconductor Manufacturing Company, www.tsmc.com.

[140] M. Bamal et al., “Performance Comparison of Interconnect Technology andArchitecture Options for Deep Submicron Technology Nodes,” Proceedingsof the International Interconnect Technology Conference, pp. 202-204, May2006.

[141] K. Usami et al., “Automated Low-Power Technique Exploiting MultipleSupply Voltages Applied to a Media Processor,” IEEE Transactions onSolid-State Circuits, Vol. 33, No. 3, pp. 463-472, March 1998.

[142] S. A. Tawfik and V. Kursun, “Low Power and High Speed Multi Thresh-old Voltage Interface Circuits,” IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems, Vol. 17, No. 5, pp. 638-645, May 2009.

[143] C. Q. Tran, H. Kawaguchi, and T. Sakurai, “Low-Power High-Speed LevelShifter Design for Block-Level Dynamic Voltage Scaling Environment,” Pro-ceedings of the International Conference on Integrated Circuit Design andTechnology, pp. 229-232, May 2005.

[144] Z. Peiyi et al., “Low-Power Clocked-Pseudo-NMOS Flip-Flop for LevelConversion in Dual Supply Systems,” IEEE Transactions on Very LargeScale Integration (VLSI) Systems, Vol. 17, No. 9, pp. 1196-1202, September2009.

[145] A. R. Alameldeen et al., “Improving Memory Reliability, Power and Perfor-mance Using Mixed-Cell Designs,” Intel Technology Journal, Vol. 17, No. 1,May 2013.

[146] Synopsys® 32/28 nm Generic Library, http://www.synopsys.com/.

[147] D. Jacquet et al., “2.6 GHz Ultra-Wide Voltage Range Energy EfficientDual A9 in 28 nm UTBB FD-SOI,” Proceedings of the Symposium on VLSITechnology, pp 44-45, June 2013.

BIBLIOGRAPHY 151

[148] H. S. Yang et al., “Scaling of 32 nm Low Power SRAM with High-k MetalGate,” Proceedings of the International Electron Devices Meeting, pp. 1-4,December 2008.

[149] K. Hirata and J. Goodacre, “ARM MPCore; The Streamlined and Scal-able ARM11 Processor Core,” Proceedings of the Asia and South PacificConference on Design Automation, pp. 747-748, January 2007.

[150] J. Goodacre, “The Homogeneity of Architecture in a Heterogeneous World,”Invited talk at the International Conference on Embedded Computer Systems,July 2012.

design methodologies and tools for vertically …

Documents