ieee transactions on computer-aided design of integrated circuits and systems,...

12
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 9, SEPTEMBER 2015 1455 3-D Stacked DRAM Refresh Management With Guaranteed Data Reliability Jaeil Lim, Hyunyul Lim, and Sungho Kang, Member, IEEE Abstract—The 3-D integrated dynamic random-access mem- ory (DRAM) structure with a processor is being widely studied due to advantages, such as a large band-width and data com- munication power reduction. In these structures, the massive heat generation of the processor results in a high operating temperature and a high refresh rate of the DRAM. Thus, in the 3-D DRAM over processor architecture, temperature-aware refresh management is necessary. However, temperature deter- mination is difficult, because in the 3-D DRAM, the temperature changes dynamically and temperature variation in a DRAM die is complicated. In this paper, a thermal guard-band set-up method for 3-D stacked DRAM is proposed. It considers the latency of the temperature data and the position difference between the temperature sensor and the DRAM cell. With this method, the data reliability of the on-chip temperature sensor-dependent adaptive refresh control is guaranteed. In addition, an efficient temperature sensor built-in and refresh control method is ana- lyzed. The expected refresh power reduction is examined through a simulation. Index Terms—3-D integration, data reliability, DRAM refresh. I. I NTRODUCTION T HREE-DIMENSIONAL (3-D) integration of dynamic random-access memory (DRAM) and processors is a promising solution for the performance enhancement of processors. The through silicon via (TSV) provides large band- width and short wire length between the processor and the DRAM. The 3-D DRAM stack over processor architecture shortens the idle time of the processor for reading memory [1]. In addition, the 3-D technique enables the combination of dif- ferent process technologies such as high-speed process CPU dies and high capacity process DRAM dies. However, the 3-D DRAM over processor architecture also incurs unintended bad effects. The most serious bad effect is a high circuit temperature [2]. The heat generation of the processor and the DRAM are added up, but the heat radiation has no difference from before. The high temperature of the DRAM influences the DRAM cell transistor’s off current. Therefore, the DRAM data loss Manuscript received July 23, 2014; revised November 14, 2014 and January 30, 2015; accepted February 16, 2015. Date of publication March 16, 2015; date of current version August 18, 2015. This work was supported by the National Research Foundation of Korea (NRF) grand funded by the Korea government, Ministry of Science, ICT and Future Planning (MSIP) (No. 2012R1A2A1A03006255). This paper was recommended by Associate Editor S. Kim. The authors are with the Department of Electrical and Electronic Engineering, Yonsei University, Seoul 120-749, Korea (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCAD.2015.2413411 in the DRAM over processor architecture is accelerated and a more frequent refresh operation is needed. The increased refresh power consumption of DRAM is a shortcoming of the DRAM over processor architecture. The increased num- ber of refresh operations affects the DRAM performance and degrades the throughput. Furthermore, the incorrect refresh control cannot guarantee the DRAM data reliability. The considerable standby power consumption of DRAM due to the refresh operation is a significant weak point of DRAM. Additionally, penalty of the DRAM refresh operation is getting higher, in the future large capacity devices [4]. There are some refresh power reduction techniques. DRAM cell access operations have the same effect as the refresh operation. By exploiting this effect, a refresh operation skipping method of the DRAM cells that is recently accessed is proposed [3]. DRAM cells have a large variation in retention time. Thus, a refresh period control method for individual cells’ different retention times is proposed [4]. A refresh period extension method that guarantees reliability with an error correction code (ECC) is proposed [5]. A refresh control scheme that uses the retention time detection and the ECC is proposed [6]. However, these previous approaches have no relation with temperature change. A temperature-compensated adaptive self-refresh control method that controls the refresh period based on a tem- perature sensor is proposed [7]. In the 3-D DRAM stack over processor architecture, in the normal computing time, temperature-dependent refresh control is necessary due to high temperatures. A 3-D DRAM operating time refresh con- trol method with an adaptive ECC is proposed [8]. For the 3-D DRAM over processor architecture, a temperature varia- tion aware bank-wise refresh control method is proposed [9]. However, on these thermal-aware approaches, the data reli- ability issue of the 3-D DRAM is not considered. For the temperature sensor-dependent adaptive refresh, the thermal guard-band is necessary; it compensates for the temperature change during the sensor read latency, the temperature differ- ence between the sensor, and the DRAM cell as well as the temperature sensor error. In the conventional 2-D DRAM, the temperature of the DRAM changes slowly, roughly on a seconds scale. However, on the processor hotspot, temperature changes very fast, roughly on a milliseconds scale. In the 3-D DRAM stack over processor architecture, the DRAM cells are influenced by the temperature of the processor. Thus, the 3-D DRAM has the temperature characteristics of the processor. The temperature of the 3-D DRAM changes quickly and the 0278-0070 c 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 18-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 9, SEPTEMBER 2015 1455

3-D Stacked DRAM Refresh Management WithGuaranteed Data ReliabilityJaeil Lim, Hyunyul Lim, and Sungho Kang, Member, IEEE

Abstract—The 3-D integrated dynamic random-access mem-ory (DRAM) structure with a processor is being widely studieddue to advantages, such as a large band-width and data com-munication power reduction. In these structures, the massiveheat generation of the processor results in a high operatingtemperature and a high refresh rate of the DRAM. Thus, inthe 3-D DRAM over processor architecture, temperature-awarerefresh management is necessary. However, temperature deter-mination is difficult, because in the 3-D DRAM, the temperaturechanges dynamically and temperature variation in a DRAM die iscomplicated. In this paper, a thermal guard-band set-up methodfor 3-D stacked DRAM is proposed. It considers the latencyof the temperature data and the position difference betweenthe temperature sensor and the DRAM cell. With this method,the data reliability of the on-chip temperature sensor-dependentadaptive refresh control is guaranteed. In addition, an efficienttemperature sensor built-in and refresh control method is ana-lyzed. The expected refresh power reduction is examined througha simulation.

Index Terms—3-D integration, data reliability, DRAM refresh.

I. INTRODUCTION

THREE-DIMENSIONAL (3-D) integration of dynamicrandom-access memory (DRAM) and processors is

a promising solution for the performance enhancement ofprocessors. The through silicon via (TSV) provides large band-width and short wire length between the processor and theDRAM. The 3-D DRAM stack over processor architectureshortens the idle time of the processor for reading memory [1].In addition, the 3-D technique enables the combination of dif-ferent process technologies such as high-speed process CPUdies and high capacity process DRAM dies. However, the3-D DRAM over processor architecture also incurs unintendedbad effects. The most serious bad effect is a high circuittemperature [2]. The heat generation of the processor and theDRAM are added up, but the heat radiation has no differencefrom before.

The high temperature of the DRAM influences the DRAMcell transistor’s off current. Therefore, the DRAM data loss

Manuscript received July 23, 2014; revised November 14, 2014 andJanuary 30, 2015; accepted February 16, 2015. Date of publication March 16,2015; date of current version August 18, 2015. This work was supportedby the National Research Foundation of Korea (NRF) grand funded by theKorea government, Ministry of Science, ICT and Future Planning (MSIP)(No. 2012R1A2A1A03006255). This paper was recommended by AssociateEditor S. Kim.

The authors are with the Department of Electrical and ElectronicEngineering, Yonsei University, Seoul 120-749, Korea (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCAD.2015.2413411

in the DRAM over processor architecture is accelerated anda more frequent refresh operation is needed. The increasedrefresh power consumption of DRAM is a shortcoming ofthe DRAM over processor architecture. The increased num-ber of refresh operations affects the DRAM performance anddegrades the throughput. Furthermore, the incorrect refreshcontrol cannot guarantee the DRAM data reliability.

The considerable standby power consumption of DRAMdue to the refresh operation is a significant weak point ofDRAM. Additionally, penalty of the DRAM refresh operationis getting higher, in the future large capacity devices [4]. Thereare some refresh power reduction techniques. DRAM cellaccess operations have the same effect as the refresh operation.By exploiting this effect, a refresh operation skipping methodof the DRAM cells that is recently accessed is proposed [3].DRAM cells have a large variation in retention time. Thus,a refresh period control method for individual cells’ differentretention times is proposed [4]. A refresh period extensionmethod that guarantees reliability with an error correctioncode (ECC) is proposed [5]. A refresh control scheme thatuses the retention time detection and the ECC is proposed [6].However, these previous approaches have no relation withtemperature change.

A temperature-compensated adaptive self-refresh controlmethod that controls the refresh period based on a tem-perature sensor is proposed [7]. In the 3-D DRAM stackover processor architecture, in the normal computing time,temperature-dependent refresh control is necessary due tohigh temperatures. A 3-D DRAM operating time refresh con-trol method with an adaptive ECC is proposed [8]. For the3-D DRAM over processor architecture, a temperature varia-tion aware bank-wise refresh control method is proposed [9].However, on these thermal-aware approaches, the data reli-ability issue of the 3-D DRAM is not considered. For thetemperature sensor-dependent adaptive refresh, the thermalguard-band is necessary; it compensates for the temperaturechange during the sensor read latency, the temperature differ-ence between the sensor, and the DRAM cell as well as thetemperature sensor error.

In the conventional 2-D DRAM, the temperature of theDRAM changes slowly, roughly on a seconds scale. However,on the processor hotspot, temperature changes very fast,roughly on a milliseconds scale. In the 3-D DRAM stackover processor architecture, the DRAM cells are influencedby the temperature of the processor. Thus, the 3-D DRAMhas the temperature characteristics of the processor. Thetemperature of the 3-D DRAM changes quickly and the

0278-0070 c© 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

1456 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 9, SEPTEMBER 2015

temperature is also high. Additionally, the temperature vari-ation of the DRAM die is as complicated as the processordie. This means that the maximum temperature of the DRAMcell is hard to find. When the maximum temperature pre-diction is incorrect, the DRAM data reliability is violated.This is because, depending on the incorrect lower temper-ature, an insufficient refresh operation is applied. On theother hand, inefficient maximum temperature estimation incursan excessive thermal guard-band. The excessive guard-bandalso incurs refresh power overhead. Therefore, the conven-tional guard-band setup method of the 2-D integration isuseless in the 3-D DRAM over processor architecture. Inthis paper, an efficient 3-D DRAM thermal guard-band setupmethod is proposed. An effective refresh control system is alsoestablished.

II. BACKGROUND

In this section, the general features of the DRAM refreshoperations are described, and the data reliability problemsin the 3-D DRAM stack over processor architecture areexplained.

A. DRAM Refresh Features

A DRAM cell is composed of a cell capacitor and a celltransistor. To retain the data, the cell capacitor keeps a highor low voltage and the cell transistor should turn off the cur-rent. However, the cell transistor is incomplete and there isan off current on the cell transistor. As time goes on, thestored voltage on the cell capacitor is gradually equalized.Thus, the voltage of the cell capacitor must be reamplifiedbefore it becomes indistinctive. This operation is the DRAMrefresh operation. The refresh operation consumes the standbypower of the DRAM, and it is a general weak point of DRAMdevices.

The limit point in which the DRAM cell maintainsdata without a refresh operation is called the retention time.The retention time is inversely proportional to the off currentof the cell transistor [10]. The off current is temperature-dependent and increases geometrically in high temperatures.To maintain the DRAM data, the refresh operation should beapplied within the retention time for all the DRAM cells. Thus,the retention time decreases and the refresh operation increasesin high temperatures. On micron 2 GB low power synchronousDRAM (LPSDRAM) [11], the retention time is 32 ms whenthe temperature is under 85 ◦C and 8 ms when it is under105 ◦C. The refresh power is proportional to the refresh oper-ation rate. In the 3-D DRAM stack over processor architecture,the refresh power takes priority due to the high temperature.

In a DRAM die, the DRAM address system is organizedwith a bank address, row address, and column address. Withina bank, a number of sense amplifiers that refresh one rowat a time exist. As Fig. 1 shows, the refresh target row isselected with the row address. The refresh counter incrementsthe refresh target row address whenever the refresh operationoccurs. Generally, synchronous DRAM (SDRAM) has no rowaddress selection for the refresh operation. Multiple banks canbe refreshed at a time. The all-bank refresh is applied to theselected row of all of the banks, whereas the per-bank refresh

Fig. 1. DRAM address system and refresh counter.

Fig. 2. Distributed and burst refresh timing.

selects a target bank [11]. The memory controller considersthe refresh timing and target bank not a target row address.

General DRAM has some types of refresh operations [12].The self-refresh is performed autonomously when the DRAMis in sleep mode. The hidden-refresh conceals the refreshwaiting time behind the read operation. As shown in Fig. 2,the burst-refresh performs the refresh operation on all of theDRAM cells, with no interval between refresh operations. Theburst-refresh is utilized when the DRAM mode changes. Onthe other hand, in the normal computing time, the distributed-refresh is applied. The distributed-refresh has a regular intervalbetween the refresh operations. In this interval, the DRAM canoperate another instruction, whereas the burst-refresh holdsthe DRAM until all of the cells are refreshed. To guaranteethe retention time of all of the cells, the memory controlleradjusts the period of the refresh operation to be below the“retention time/number of rows.” When the per-bank refreshis applied, the refresh interval must be managed separately foreach bank.

In this paper, the DRAM cell area block that the refreshperiod is adjusted to is called the refresh block. In generalSDRAM, a bank is a refresh block or a cell distributed area on

Page 3: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

LIM et al.: 3-D STACKED DRAM REFRESH MANAGEMENT WITH GUARANTEED DATA RELIABILITY 1457

Fig. 3. Temperature variation of the 3-D DRAM stack over processor architecture in HOTSPOT [14] simulation (one processor die andfour 2 GB DRAM dies).

a DRAM die is a refresh block, because a refresh target rowis not selectable. Within a refresh block, the refresh period isevenly applied. Therefore, the refresh period must be deter-mined by the cell that has the worst retention time in therefresh block. Asynchronous DRAM has a row address strobeonly refresh mode that can select a refresh target row address.Thus, an asynchronous DRAM refresh block can be morefinely composed. However, recently, asynchronous DRAM isnot commonly used, and thus in this paper, only SDRAM isconsidered.

B. 3-D DRAM Data Reliability Issue

In the 3-D DRAM stack over processor architecture, theretention time of the DRAM shortens. As a result, morerefresh operations are necessary, and more standby poweris consumed. When the maximum power consumption ofthe processor is taken into account, excessive refresh oper-ations are executed. However, the processor generally doesnot work at the maximum performance and consumes lesspower. Therefore, the temperature of the 3-D DRAM is lowerthan the maximum most of the time. To reduce the refreshpower, the excessive refresh operations should be removedand a proper refresh rate must be applied. The proper refreshrate can be identified depending on the temperature sensorand adaptive refresh is applied [7], [8]. When the tempera-ture of the DRAM cell is identified improperly, the DRAMloses data. Thus, a sufficient thermal guard-band is needed toprepare for a rapid temperature change and the temperaturesensor incorrectness.

For the adaptive refresh control, an accurate DRAM celltemperature must be identified. The temperature sensor andDRAM cell have distance. This is because a DRAM cellarray has a wide distribution on a DRAM die, but the numberof temperature sensors is limited. In addition, the tempera-ture sensor cannot be allocated to the cell array, only to theperipheral circuits. Thus, the sensor and the cell array alsohave a temperature difference. In the 3-D DRAM stack overprocessor architecture, this temperature difference gets largerbecause, the bottom die processor generates massive heat anda complicated temperature variation. As shown in Fig. 3, thetemperature distribution of the processor die directly influencesthe temperature distribution of the DRAM die. Moreover, thehottest spot of the processor dynamically changes dependingon workloads. In some specific applications, the temperature

of small blocks such as a branch predictor can overtake thetemperature of an arithmetic-logic unit (ALU) block [13].Therefore, to guarantee the DRAM data reliability, the temper-ature upper bound of the refresh target cells must be identifiedbased on the summation of the temperature sensor referenceand the thermal guard-band that considers the difference inposition between the temperature sensor and the target cell.The identified upper bound is also applied to the thermalguard-band.

In the thermal guard-band setup method, the temperaturesensor data latency must be considered. At the time that therefresh controller perceives the sensor data, the DRAM celltemperature can differ from the reported temperature sensordata due to the changes in the temperature of the DRAMcell during the temperature sensor data latency. The sensordata latency is the time from the temperature sensor samplinguntil the sensor data utilization in the refresh controller. In the3-D DRAM stack over processor architecture, the dynamicpower of the processor is massive, and as a consequence, thetemperature gradient of the DRAM cell is greatly increasedfrom the conventional 2-D DRAM chip. An excessive guard-band incurs power overhead on adaptive refresh. Therefore,efficient refresh control system construction and an efficientguard-band setup method are required.

In this paper, a thermal guard-band setup method is pro-posed that guarantees the data reliability of adaptive refreshin the 3-D DRAM stack over processor architecture. Thethermal guard-band setup method considers both position dif-ference on a chip and temperature sensor data read latency.Additionally, an effective refresh control system is deter-mined, which involves the temperature sensor sampling rate,controller design, and temperature sensor built-in position.

The rest of this paper is organized as follows. Section IIIdescribes the thermal guard-band setup method. Section IVdescribes an efficient refresh control system design method.Section V presents experimental results from the simulation.Finally, Section VI concludes this paper.

III. THERMAL GUARD-BAND SETUP

As mentioned in Section II, the position difference on a chipand the temperature sensor data latency must be considered inthe guard-band setup process. If these two factors are cal-culated separately and added up, some heating elements canoverlap, and the thermal guard-band will be overestimated.

Page 4: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

1458 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 9, SEPTEMBER 2015

To avoid this problem, a thermal guard-band setup method isproposed that simultaneously considers the position differenceand the temperature sensor data latency.

A. Thermal Model

The variation and change in the temperature in a chip ismodeled by a thermal resistance capacitance (RC) model [14].The thermal RC model calculates the temperature by creatinga heat generation block equivalent RC circuit. The resistor andcapacitor is a linear element, and thus the thermal RC model isa linear model also. For the 3-D stack architecture, the thermalRC model divides each layer into grids. In this paper, eachlayer is divided into 64 × 64 grids. The temperature of a gridis modeled as follows:

f (t) =m∑

i=1

fi(t) =m∑

i=1

hi(t) ∗ pi(t) (1)

where t is the current time, f (t) is the temperature at time t,m is the number of heat generating blocks, fi(t) is the temper-ature impact that is caused by the ith block, hi(t) is an impulseresponse that is caused by the ith block, and pi(t) is the powerconsumption of the ith block. The thermal RC model is a linearmodel, thus the temperature can be modeled as a summationof each block’s temperature impact. The temperature impactof block i can be modeled as follows:

fi(t) = hi(t) ∗ pi(t) =t∫

0

hi(τ )pi(t − τ)dτ. (2)

The impulse response can be obtained by observing thetemperature simulation with the unit impulse power input.In the impulse response calculation, the convection temper-ature is the zero point. The proposed thermal guard-bandsetup method calculates the thermal guard-band based on theimpulse responses of the grid. Detailed information aboutthe thermal RC model and the impulse response has beenpreviously presented [15].

B. Problem Formulation

A detecting method for the maximum temperature differ-ence between two positions on a chip is proposed [15]. Inthe thermal guard-band setup method, the temperature sen-sor data latency is added to the problem. The problem isformulated as follows:

fa(t + d) − fb(t) =m∑

i=1

(fi_a(t + d) − fi_b(t)

)

=m∑

i=1

⎝t+d∫

0

hi_a(τ )pi(t + d − τ)dτ

−t∫

0

hi_b(τ )pi(t − τ)dτ

⎠ (3)

where fa is the temperature of the refresh target cell position,fb is the temperature of the sensor position, d is the temperaturesensor data latency, and hi_a is the impulse response of position

a caused by block i. In (3), fa(t) represents the temperatureof the temperature sensor at time t and fb(t + d) representsthe temperature of the DRAM refresh cell at time t + d. Themaximum value of (3) represents the maximum error betweenthe reported data of the temperature sensor and the temperatureof the DRAM cell after the lapse of the sensor data latency.The temperature difference due to the positional difference iscomprised in the different impulse responses, hi_a and hi_b.Thus, in the thermal guard-band setup method, the maximumvalue of (3) must be identified. The power consumption is anunknown value and has bounds as follows:

pmini ≤ pi(t) ≤ pmax

i . (4)

In this paper, the power consumption boundary of the pro-cessor assumes that the minimum value, pmin

i , is the staticpower and the maximum value, pmax

i , is the summation ofstatic power and dynamic power.

C. Guard-Band Setup

The thermal RC model is a linear model. Therefore, themaximum value of (3) is equal to the summation of the max-imum values of the impact from each block. The impact froma block, i, is equal to the following:

fi_a(t + d) − fi_b(t)

=t+d∫

0

hi_a(τ )pi(t + d − τ)dτ −t∫

0

hi_b(τ )pi(t − τ)dτ. (5)

The first integral section of (5) can be modified as follows:

t+d∫

0

hi_a(τ )pi(t + d − τ)dτ

=t+d∫

d

hi_a(τ )pi(t + d − τ)dτ +d∫

0

hi_a(τ )pi(t + d − τ)dτ

=t∫

0

hi_a(τ + d)pi(t − τ)dτ +d∫

0

hi_a(τ )pi(t + d − τ)dτ.

(6)

Using (6), (5) can be modified as follows:

t∫

0

(hi_a(τ + d) − hi_b(τ )

)pi(t − τ)dτ

+d∫

0

hi_a(τ )pi(t + d − τ)dτ. (7)

Impulse responses hi_a(t) and hi_b(t) are known functions,and the power consumption, pi(t), is an uncertain function.In (7), the first integral section integrates function pi(t) from0 to t, and the second section integrates it from t to t+d. Thesetwo sections integrate different intervals of pi(t). pi(t) has the

Page 5: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

LIM et al.: 3-D STACKED DRAM REFRESH MANAGEMENT WITH GUARANTEED DATA RELIABILITY 1459

boundary of (4). Thus, the maximum value of (7) is equal tothe following:

pmini

H−

(hi_a(τ + d) − hi_b(τ )

)dτ

+ pmaxi

H+

(hi_a(τ + d) − hi_b(τ )

)dτ + pmax

i

d∫

0

hi_a(τ )dτ

(8)

where H+ is an interval where hi_a(τ + d) − hi_b(τ ) takesa positive value, and H− is an interval where the for-mula takes a negative value. The negative value of functionhi_a(τ ) is a small negligible value. As a result, the maximumvalue of (3) is equal to the following:

m∑

i=1

⎜⎜⎜⎜⎜⎝

pmini

H−(hi_a(τ + d) − hi_b(τ ))dτ

+pmaxi

H+(hi_a(τ + d) − hi_b(τ ))dτ

+pmaxi

d∫

0hi_a(τ )dτ

⎟⎟⎟⎟⎟⎠. (9)

Equation (9) indicates the maximum temperature differencebetween the temperature sensor location grid and the refreshtarget cell location grid. To employ this thermal guard-bandsetup method, formula (9) must be calculated for all of thegrids on the refresh block (per-die or per-bank) and the high-est guard-band must be selected because the refresh periodis applied evenly in a refresh block. When multiple temper-ature sensors are applied, we must select a sensor in whichthe summation of the sensed temperature and the guard-bandis the smallest. With this method, the adaptive refresh controlcan guarantee the data reliability considering the temperaturevariation and sensor data latency.

IV. DESIGN PROCESS

To improve the efficiency of adaptive refresh in the3-D DRAM stack over processor architecture, the refreshcontrol system must be organized so that it is suitable forthe 3-D DRAM temperature changing characteristics. In thissection, an efficient temperature sensor built-in method andefficient refresh controller design methods are described.

A. Refresh Controller Design

The refresh controller controls the refresh period of theDRAM during the processor runtime. To apply adaptiverefresh, the refresh controller must control the refresh periodbased on the sensed temperature data. However, the constantrepetitive calculation of the guard-band setup in Section IIIis wasteful. Instead of this, the thermal guard-band can beprecomputed in the design time and the thresholds embeddedin the memory controller. At runtime, the refresh controllercompares the sensed data to the precomputed thresholds andselects a proper refresh period.

The refresh controller for adaptive refresh control can beorganized as in Fig. 4. To control the refresh period 0.5 msinterval, a fine-grained refresh signal is necessary that has an

Fig. 4. Refresh controller for adaptive refresh control.

interval of 0.5/number of rows. The fine grained refresh sig-nal is counted on the counter and skipped until it reaches theproper refresh interval. When the counter reaches the properrefresh interval, the refresh signal is permitted and the counterresets. The proper refresh interval is determined from therefresh interval table by comparing the sensed temperaturedata and the precomputed thresholds. This refresh controlleris necessary for each refresh block.

When multiple temperature sensors are applied, the refreshinterval table also compares the multiple sensed temperaturedata. Therefore, when more temperature sensors are applied,the hardware overhead is more necessary. Employing multipletemperature sensors in adaptive refresh reduces the ther-mal guard-band with more accurate temperature prediction.However, the increased power consumption of the employedsensors and the refresh controller overhead must not exceedthe benefits of the refresh power. Thus, the temperature sensorsmust be employed selectively, and the unemployed tempera-ture sensors must be turned off. Selecting an efficient sensornumber and position is important for reducing the thermalguard-band.

B. Temperature Sensor Performance

As shown in Fig. 5, the temperature sensor data latency, d,arouses the thermal guard-band with the temperature gradi-ent of the DRAM. The sensor data latency is the time fromthe temperature sensor sampling to the sensor data utilizationin the refresh controller. When the temperature sensor readoperation is performed right before the temperature sensorsampling, the temperature sensor data latency is the summa-tion of the sampling interval, the sensor read latency, and therefresh controller latency.

Conventional 2-D integrated DRAM chips have a smalltemperature gradient and small guard-band is also necessary.However, in the 3-D DRAM stack over processor architecture,the processor incurs a high temperature and the dynamic powerof the processor changes depending on the application charac-teristics. Therefore, the temperature of the 3-D DRAM stackedover processor changes dynamically and the 3-D DRAMhas a massive temperature gradient. Thus, temperature sensordata latency regulation is necessary.

Conventional 2-D integrated DRAM chips have a long sen-sor read latency, because communication with the processor

Page 6: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

1460 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 9, SEPTEMBER 2015

Fig. 5. Temperature sensor data latency and timing.

Fig. 6. Temperature sensor data latency dependent thermal guard-band.

must be performed through an off-chip data bus. However,in the 3-D DRAM stack over processor architecture, the pro-cessor and the DRAM communicate through TSV. Thus, thewire length between the processor and the DRAM short-ens. The maximum latency on a 100 mm2 four-stack chipis modeled as 24 ns [16]. The refresh controller latency iswithin a few clock cycles. Therefore, in the 3-D DRAM stackover processor architecture, the influence of the sensor readlatency is insignificant and the sensor sampling interval isa large part of the sensor data latency. The temperature sen-sor sampling interval is important for the temperature-awareadaptive refresh control, which means that the tempera-ture sensor performance must be carefully decided in thedesign time.

Fig. 6 shows the temperature sensor data latency dependentthermal guard-band on the DRAM die. The graph shows theaverage thermal guard-band of all DRAM cells. A tempera-ture sensor on the center of the DRAM die is employed andthe guard-band setup method in Section III is used. A sin-gle core processor is modeled on the bottom die. The thermalguard-band increases remarkably below 20 ms, whereas thethermal guard-band converges above 40 ms. This phenomenonis a result of the large dynamic range of the processor. Whenthe power consumption of the processor increases all at once,the temperature of the hotspot also increases rapidly. Theguard-band setup method must prepare for the unexpected heatgeneration. Therefore, the thermal guard-band setup method

Fig. 7. Temperature sensor built-in positions and floor plan. Top left: DRAMdie. Top right: single-core processor die. Bottom: eight-core processor die.

allocates a large guard-band in the early part of the sensordata latency.

To reduce the thermal guard-band, the sensor data latencymust be regulated. A large part of the sensor data latencyis the sampling interval. Thus, when higher sampling rate isapplied, the thermal guard-band is more regulated. However,a temperature sensor that has a high sampling rate has a greaterpower consumption and area overhead. Therefore, we mustconsider the tradeoffs between the high sampling rate sensorand the overhead. Furthermore, the power overhead of the highsampling rate sensor must not exceed the benefit.

C. Temperature Sensor Built-in Position

The temperature sensor built-in position affects the effi-ciency of the guard-band setup. Embedding numeroustemperature sensors in the DRAM die and activating themselectively is good for the refresh power reduction, but thetemperature sensor area overhead is significant. Therefore, theDRAM designer must consider the temperature variation ofthe target architecture when designing the DRAM layout.

In this paper, the sensor built-in position is assumed to bethe same as that shown in Fig. 7. On a DRAM die, at everyposition of the fourth quartile coordinate, sensors are built-in. On the single-core processor die, there is a sensor at thecenter of the integer ALU and one at the center of the floatingpoint ALU. The integer ALU and the floating point ALU arethe largest power consumption blocks on the processor die.

Page 7: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

LIM et al.: 3-D STACKED DRAM REFRESH MANAGEMENT WITH GUARANTEED DATA RELIABILITY 1461

Fig. 8. Vertical sensor position dependent average temperature of employedguard-band variation.

On the eight-core processor die, at the center of the eightexecution unit, eight sensors are built-in. As shown in Fig. 7,the execution unit is the most heat generation block on theprocessor die.

Generally, the DRAM die layout is uniform on the3-D stack. Thus, the temperature sensors are placed in a ver-tical position at each layer. The vertically placed temperaturesensors have differences in their thermal guard-banding effi-ciency even though they are placed very close. Fig. 8 showsthe average value of the employed guard-band temperaturevariation on the DRAM dies when the vertically distributedtemperature sensors are employed. In this simulation, 5 mssensor data latency is assumed. Layers 1–4 represent DRAMdies with vertical sensor positions, where layer 1 is top DRAMdie and layer 4 is above the processor die. The temperaturesensors are positioned at the center of the DRAM die. Theint_alu represents the temperature sensor in the integer ALUof the bottom processor die. Layer 2 is the most efficient inthe vertical variation, because it is in a sandwiched position,and sufficiently far from the heat sink. A position far fromthe heat sink is an unfavorable position for heat spread. Thetemperature variation of layers 3 and 4 are too sharp andthe temperature variation of layer 1 is too spread out for thetemperature guard-band setup.

Fig. 9 shows the proposed thermal guard-band variationson DRAM layer 3 with various temperature sensor positions,with the temperature simulation results. The temperature sen-sors are distributed horizontally as in Fig. 7, and the sensorsare also on DRAM layer 3. The white dots in Fig. 9 indi-cate the temperature sensor positions. The temperature sensordata latency is assumed to be 5 ms. As shown in Fig. 7, thereare three thermal hot spots on the eight-core processor, the exe-cution unit bundle of cores 1–4, cores 5–8, and the crossbar.The top left side of Fig. 9 shows the temperature distributionof fpppp of standard task graph [17]. And other images showthe distribution of guard-band included temperature distribu-tion. As shown in Fig. 9, the thermal guard-band is assignedtightly when the temperature sensor is nearby. This is becausethe power consumption on a block that is far from the tem-perature sensor slightly affects the temperature of the sensorposition, and the worst-case temperature is assigned in theguard-band setup process. In addition, the temperature gapon the DRAM die becomes larger when the temperature sen-sor is far from thermal hotspot of the bottom processor die.

The temperature gap on a DRAM die incurs inefficiency ofrefresh power, because the refresh period is applied uniformlyon a refresh block. Additionally, when the temperature sen-sor position is far from the thermal hotspot of the bottomdie, the peak temperature gets higher. This is shown when thetemperature sensor is in position B or H. In the case of sen-sor E, the position is relatively closer to the execution unitbundle, and the peak temperature is smaller. When multiplesensors are applied, the thermal guard-band is tightly appliedover a larger area, and the peak temperature and tempera-ture gap decrease. When using three sensors, the guard-bandincluded temperature distribution is similar to the temperaturesimulation result shown on the top left side of Fig. 9, whichshows the results without the guard-band due to the sensordata latency. Therefore, it is efficient for the temperature sen-sors to be built-in close to the center of the DRAM cells andthermal hotspot.

In summary, placing the temperature sensors vertically ina sandwiched layer, concurrently far from the heat sink isefficient. In addition, in a horizontal distribution, a positionthat is close to the target DRAM cells and thermal hotspotof the bottom processor die is advantageous. When multipletemperature sensors are activated, activated sensors should bedistributed horizontally not vertically, because the 3-D stackeddie thickness is very thin and the horizontal distance betweentemperature sensors is very short.

D. Refresh Control System Design Process

For an efficient adaptive refresh control system design, thenumber and position of the activated temperature sensors canbe determined through simulation, as shown in Fig. 10. Thethermal guard-band of each sensor is identified separately.Following this, in regards to the target benchmark’s temper-ature variation, we must find the number and position of theactivated temperature sensors that minimize the total powerconsumption. The total power consumption is the summa-tion of the refresh power and refresh controller overhead. Theselection of the sensors that are included under the activatedgroup is an NP-hard problem. The activated group is the set oftemperature sensors which are employed for the refresh con-troller and turned on. To solve this selection problem, we selecta temperature sensor that minimizes the total power and addit repeatedly. Through a simulation of the single core modelwith up to ten sensors, this greedy selection is compared withan optimal selection that tries every case. As a result, thereis no difference for up to four activated sensors, and othershave roughly a 0.1% difference in refresh power consump-tion. Whenever a temperature sensor is added to an activatedgroup, the controller overhead and power consumption of theactivated sensors increase continuously. Moreover, the refreshpower of the DRAM converges. Therefore, the number of sen-sors in the activated group can be identified when the reboundpoint of the total power is observed. This simulation processmust be repeated for various sensor types and various built-inposition sets of the temperature sensor. We can then select themost efficient combination of a sensor type and sensor built-inposition.

Page 8: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

1462 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 9, SEPTEMBER 2015

Fig. 9. Proposed thermal guard-band variation on DRAM layer 3 in various temperature sensor positions (fpppp of standard task graph [17] is used).

Fig. 10. Refresh control system design process.

It is advantageous to stack the different process technologiesof the DRAM and processor. Therefore, the DRAM venderand processor vender also differ. In the process presented inFig. 10, the underlined part must be performed by the DRAMvender to adjust the temperature sensor design in the DRAMdie. To perform the underlined part, the information for theother part is necessary. The processor vender also requiresthe information for the temperature sensor in the DRAM.

The temperature sensors in the processor die can also beutilized. However, in the simulation result, this is ineffi-cient when compared with utilizing the sensor in the DRAMdie. Therefore, these two venders must share the informationfor the floor plan design, temperature sensor design, powerconsumption, and other necessary aspects.

The DRAM vender cannot calibrate the temperature sen-sor design and built-in position for the target processorsseparately because it is hard to design the DRAM layoutindividually. Therefore, the DRAM vender must design thetemperature sensors with the general information from varioustarget 3-D DRAM stack over processor architecture.

V. EXPERIMENTS

A. Experimental Setup

In this paper, the 3-D DRAM stack over processor archi-tecture is modeled as Fig. 11. Four DRAM dies are over thebottom processor die. There are two processor die architec-tures, single core Alpha 21364 processor [18], and eight-coreNiagara1 processor [19]. We assume the speed to be 4.5 GHzat 32-nm process technology for processor modeling. The areaand power consumption of the processor is obtained frommulticore power, area, and timing (McPAT) [20]. The powerconsumption of the DRAM die, including the refresh power,is calculated based on micron 2 GB mobile LPSDRAM [11]

Page 9: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

LIM et al.: 3-D STACKED DRAM REFRESH MANAGEMENT WITH GUARANTEED DATA RELIABILITY 1463

Fig. 11. 3-D DRAM stack over processor architecture.

and 4 GB low power double data rate 2 [21]. The 2 GBDRAM is stacked over the single core model, and the 4 GBDRAM is over the eight-core model. The DRAM power modelis obtained from a micron technical note [22]. The area ofthe DRAM is obtained from cache access and cycle timemodel [23] because area is not shown on the data sheet. Weassume 32-nm process technology and eight-bank architecture.We assume that the cell array distribution of the DRAM dieis evenly distributed. The influence of the DRAM on the tem-perature distribution is negligible because the heat generationin the DRAM would be well spread. For the single core sim-ulation, x-bench benchmark [24] is used as an application onthe processor. Three applications of x-benchmark are used,x-povray, x-anim, and x-lock. The processor performance ismodeled on simplescalar [25]. For the eight-core simulationwith the task scheduling simulation, the utilization percentageof each core is computed. In order to model the power dissi-pation of the cores, McPAT [20] tool is used. Three real taskgraphs and 20 random task graphs of standard task graph [17]are used. The real task graphs are fpppp, robot control, andsparse matrix, which have 334, 88, and 96 tasks, respectively.The 20 random graphs are used, and each has 300 tasks. Fortask mapping, a round-robin mapping method that authorizespriority to the empty core is used. The sensor built-in positionis assumed to be as shown in Fig. 7. Therefore, the total num-ber of temperature sensors is 38 for the single core processormodel and 44 for the eight-core processor model.

For the temperature simulation, HotSpot tool [14] is used.The natural convection model is used and the ambient tempera-ture is 45 ◦C. For the temperature impulse response extraction,1 s length simulation with 1 ms interval is analyzed. There are15 and 37 power consumption blocks, respectively, for the sin-gle and eight-core processors. The floor plan of the processoris organized roughly similar to the Alpha 21364 processor [18]and Niagara1 [19], as in Fig. 7. We assume that the powerconsumption of the DRAM die is uniformly distributed. Thearea difference of the DRAM and the processor die incursempty space on the upper and lower sides of the bottom die.

The DRAM retention time is inversely proportional to thetransistor off leakage current. The temperature-dependent offleakage current is obtained from a table in McPAT tool. TheDRAM data reliability is adjusted to a level equivalent to 8 msat 105 ◦C. The refresh controller overhead is obtained withthe NanGate open cell library [26]. The refresh controller inFig. 4 is compiled with the library. The dissimilarity of the

TABLE ITEMPERATURE SENSOR TYPES

technology is revised to 32 nm by the square of the voltage.The frequency is assumed to be equivalent.

Four types of sensors are employed in this paper. Table Ishows four types of sensors which have various samplingintervals. Sensor type 1 has the shortest sampling interval,greatest power consumption, and worst accuracy. In contrast,sensor type 3 has the lowest power consumption. Sensortypes 2 and 4 are the most accurate. The process technologiesin these papers are dissimilar, thus the data from the papersare revised to a 32-nm process technology. The scaling impactis obtained from [31].

B. Experimental Results

To evaluate the efficiency of the proposed guard-band setupmethod and the refresh control system, the design processdescribed in Section IV-D is applied to the above simula-tion environments. For refresh block composition, the per-bankrefresh and all-bank refresh are both employed. In the per-bankrefresh, a bank on a DRAM die is a refresh block. Thus, thereexist 32 refresh blocks on a chip when the per-bank refresh isemployed. In contrast, when the all-bank refresh is employed,a DRAM die is a refresh block. Thus, there are four refreshblocks on a chip. Therefore, the per-bank refresh requires32 refresh controllers whereas the all-bank refresh requiresfour refresh controllers. However, the per-bank refresh reducesthe refresh power more effectively, because the fine-grainedrefresh block adjusts the refresh period more flexibly. Forexample, in Fig. 7, banks 0 and 4–7 require a lower refreshrate than banks 2–4 due to the temperature variation. The all-bank refresh must employ the maximum refresh rate, whereasthe per-bank refresh adjusts flexibly.

Table II shows the detailed data about the greedy selec-tion of the activated sensors, such as the refresh power ofeach application and the overhead. The total refresh poweris the sum of average refresh power, refresh controller over-head, and activated sensor power. The eight-core model andsensor type 2 are used for the simulation. Fpppp, robotcontrol, and sparse matrix are used as real task graphs instandard task graph [17]. As presented in Section IV-D, theactivated sensors are selected by greedy selection. A sen-sor position that minimizes the average power is selectedfor the activated group. As activated sensors are added, theaverage refresh power decreases. When an activated sensoris added, the refresh controller should compare the addi-tional data, and the refresh controller overhead increases.A DRAM die has eight banks, so the per-bank refresh requireseight times more overhead. As a result, the per-bank refresh

Page 10: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

1464 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 9, SEPTEMBER 2015

TABLE IIREFRESH POWER REDUCTION AND COST COMPARISON

TABLE IIITOTAL REFRESH OVERHEAD REDUCTION

case activates one temperature sensor due to the refresh con-troller overhead, while the all-bank refresh case activates fivetemperature sensors. In this paper, the refresh controller com-pares the temperature sensor data within a clock cycle. Ifthe temperature sensor data comparison is operated withinmultiple clock cycles and if a more coarse-grained refreshperiod domain is used, the refresh controller overhead can bereduced, and the efficient activated sensor group compositioncan change.

Table III shows the total refresh overhead results for thevarious simulation elements presented above. The single coreand the eight-core processor models are evaluated and fourtypes of temperature sensors in Table I are evaluated. Thesimulation process of Table II is applied, and the most effi-cient result is shown in Table III. The throughput loss due tothe refresh operation of the DRAM is presented in Table III.During the refresh operations, data on the DRAM cannot beaccessed. Thus, the refresh operations occupy the DRAM and

reduce the throughput. In the case of high-density DRAM,the refresh operation latency is increased [4], increasing theimportance of throughput improvement.

When the number of activated sensors is 0, no sensor data isemployed, and the guard-band setup method is not used. Inthis case, the refresh rate is decided based on the temperaturesimulation results of the maximum power dissipation, and thedynamic temperature change is not monitored. There is noexisting guard-banding method that considers both locationsof the temperature sensors and timing latency analytically, sothe fixed refresh rate method is used for the baseline of thisapproach.

In the simulation results of Table III, in terms of the refreshpower, sensor type 2 is the most efficient in the case ofthe eight-core model, and the per-bank refresh block is themost efficient on single core model. In the case of the all-bank refresh block on the single core model, sensor type 1 isthe most efficient. Considering the throughput loss, sensor

Page 11: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

LIM et al.: 3-D STACKED DRAM REFRESH MANAGEMENT WITH GUARANTEED DATA RELIABILITY 1465

type 1 is the most efficient for the single core model, whereassensor type 2 is the most efficient for the eight-core model.Sensor types 3 and 4 are inefficient due to their relatively longsampling intervals. Sensor type 1 is efficient for the single coremodel due to the extremely short sampling interval. For theeight-core model, sensor type 2 is more efficient since it ismore accurate than sensor type 1. According to the resultsshown in Table III, short sensor data latency is importantfor the single core model, while temperature sensor accuracyis more important for the eight-core model. One differencebetween the single core model and the eight-core model is theheat distribution. On the single core model, the heat generationis concentrated on the integer ALU block, as shown in Fig. 7.On the contrary, on the eight-core model, the heat can be dis-sipated through the eight cores. Therefore, the single core hasa larger heat mass, allowing for more rapid peak tempera-ture change. This means that the sensor data latency inducesa larger thermal guard-band on the single core model. It canbe concluded that when the heat generation is concentrated,faster temperature sensors are necessary.

As a result, the proposed 3-D stacked DRAM refresh man-agement method reduces the refresh operation by roughly 50%in the single core model, whereas the refresh power reduc-tion is roughly 30% in the eight-core model. However, thecomparison between the single core and eight-core modelsis inaccurate, because the workloads on the processors differfrom each other. Standard task graph [17] cannot be simu-lated on SimpleScalar [25]. Therefore, the heat generation ofthe task mapped core is assumed as peak power dissipationof the core. This is disadvantageous for the eight-core model,because there is a smaller temperature difference among peaktemperature, baseline, and temperature of the benchmark runs.Additionally, there are empty spaces on the processor die thatcompensate for the area difference between the processor dieand the DRAM die. These empty spaces act like a heat sink.As shown in Fig. 7, the empty space on the eight-core modelis larger than that of the single core model. This environmentaldifference induces a difference in heat spreading features.

Table IV shows the dependence of refresh power reductionon the number of temperature sensors in a DRAM die. Forthe data in Table IV and positions A through I represent thesensor positions shown in Fig. 7. The temperature sensors onthe bottom processor die are also considered in the simulation,but since these are not efficient, they are not selected. Thereare two simulation features as shown in Table IV. In the acti-vated sensor part of Table IV, the activated sensor position isdescribed as “DRAM layer number_sensor built-in position,”and numerous sensor positions are near thermal hotspots ofthe bottom die.

When more sensors are built-in, less refresh power is con-sumed. However, there are some exceptional cases such aswhen D, E, and F are activated. For the single core model,the thermal hotspot is near sensor positions B and C. For theeight-core model, the thermal hotspot is near sensor positionsB and H. When the sensors are located at positions D, E, and F,there is no thermal hotspot nearby for either processordesign. As a result, accurate temperature sensor positionsare more important than the number of temperature sensors.

TABLE IVTEMPERATURE SENSOR BUILT-IN POSITION COMPARISON

However, as described in Section IV, calibration of the temper-ature sensor positions is difficult. Creation of various layoutsfor each DRAM die is also difficult. As a result, data collec-tion through design process simulations in Section IV of thediverse architectures is advantageous for the DRAM vendor.

VI. CONCLUSION

In this paper, the thermal guard-band setup method for3-D stacked DRAM refresh management is proposed. Whenthe adaptive refresh is employed, this method guarantees theDRAM data reliability in the 3-D DRAM over processorarchitecture which has a temperature variation that changesdynamically and is difficult to predict. The simulation resultsindicate that the proposed guard-band setup method and theadaptive refresh reduces the refresh overheads up to 50%.

For the efficiency of the refresh control system, whendesigning temperature sensors for the DRAM, the DRAMdesigner must consider the processor architecture. This isbecause the temperature sensor built-in position near the ther-mal hotspot of the processor is advantageous, and the refreshpower efficiency is dependent on the performance of the tem-perature sensors. In this paper, the design process of the3-D DRAM refresh control system based on the temperaturesimulation is proposed. With the proposed simulation process,the DRAM designer can determine the efficient tempera-ture sensor built-in positions and performance. The simulationresults show variation in the efficiency of the built-in sensordependent refresh power.

REFERENCES

[1] G. H. Loh, “3D-stacked memory architectures for multi-core proces-sors,” in Proc. Int. Symp. Comput. Archit. (ISCA), Beijing, China, 2008,pp. 453–464.

[2] J. Meng, D. Rossell, and A. K. Coskun, “Exploring performance, power,and temperature characteristics of 3D systems with on-chip DRAM,” inProc. Int. Green Comput. Conf. Workshops (IGCC), Orlando, FL, USA,2011, pp. 1–6.

Page 12: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL…soc.yonsei.ac.kr/Abstract/International_journal/pdf/131... · 2017. 2. 27. · IEEE TRANSACTIONS

1466 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 9, SEPTEMBER 2015

[3] M. Ghosh and H.–H. S. Lee, “Smart refresh: An enhanced memorycontroller design for reducing energy in conventional and 3D die-stackedDRAMs,” in Proc. Int. Symp. Microarchit. (MICRO), Chicago, IL, USA,2007, pp. 134–145.

[4] J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, “RAIDR: Retention-awareintelligent DRAM refresh,” in Proc. Int. Symp. Comput. Archit. (ISCA),Portland, OR, USA, 2012, pp. 1–12.

[5] C. Wilkerson et al., “Reducing cache power with low-cost, multi-biterror-correcting codes,” in Proc. Int. Symp. Comput. Archit. (ISCA),Saint-Malo, France, 2010, pp. 83–93.

[6] Y. Wang, Y. Han, and H. Li, “A low power DRAM refresh controlscheme for 3D memory cube,” in Proc. IEEE COOL Chips XVII,Yokohama, Japan, 2014, pp. 1–3.

[7] Y. Kagenishi et al., “Low power self refresh mode DRAM with tem-perature detecting circuit,” in Proc. Symp. VLSI Circuits, Kyoto, Japan,1993, pp. 43–44.

[8] W. Yun, K. Kang, and C.-M. Kyung, “Thermal-aware energy mini-mization of 3D-stacked L3 cache with error rate limitation,” in Proc.IEEE Int. Symp. Circuits Syst. (ISCAS), Rio de Janeiro, Brazil, 2011,pp. 1672–1675.

[9] M. Sadri, M. Jung, C. Weis, N. Wehn, and L. Benini, “Energy optimiza-tion in 3D MPSoCs with wide-I/O DRAM using temperature variationaware bank-wise refresh,” in Proc. Design Autom. Test Europe Conf.Exhibit. (DATE), Dresden, Germany, 2014, pp. 1–4.

[10] W. Kong, P. C. Parries, G. Wang, and S. S. Iyer, “Analysis of retentiontime distribution of embedded DRAM—A new method to characterizeacross-chip threshold voltage variation,” in Proc. Int. Test Conf. (ITC),Santa Clara, CA, USA, 2008, pp. 1–7.

[11] 2 GB: x16, x32 Mobile LPDDR2 SDRAM S4 Features, Micron Technol.Inc., Boise, ID, USA, 2010.

[12] Various Methods of DRAM Refresh, Micron Technol. Inc., Boise, ID,USA, 1999.

[13] J. S. Lee, K. Skadron, and S. W. Chung, “Predictive temperature-awareDVFS,” IEEE Trans. Comput., vol. 59, no. 1, pp. 127–133, Jan. 2010.

[14] K. Skadron et al., “Temperature-aware microarchitecture,” in Proc. Int.Symp. Comput. Archit. (ISCA), San Diego, CA, USA, 2003, pp. 2–13.

[15] S. Sharifi and T. S. Rosing, “Accurate direct and indirect on-chiptemperature sensing for efficient dynamic thermal management,” IEEETrans. Comput.-Aided Design Integr. Circuits Syst., vol. 29, no. 10,pp. 1586–1599, Oct. 2010.

[16] D. H. Kim, S. Mukhopadhyay, and S. K. Lim, “TSV-aware intercon-nect length and power prediction for 3D stacked ICs,” in Proc. Int.Interconnect Technol. Conf. (IITC), Sapporo, Japan, 2009, pp. 26–28.

[17] T. Tobita and H. Kasahara, “A standard task graph set for fair evalua-tion of multiprocessor scheduling algorithms,” J. Schedul., vol. 5, no. 5,pp. 379–394, Sep. 2002.

[18] A. Jain et al., “A 1.2 GHz Alpha microprocessor with 44.8 GB/s chippin bandwidth,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC),San Francisco, CA, USA, 2001, pp. 240–241.

[19] U. Nawathe et al., “Implementation of an 8-core, 64-thread, power-efficient SPARC server on a chip,” IEEE J. Solid-State Circuits, vol. 43,no. 1, pp. 6–20, Jan. 2008.

[20] S. Li et al., “McPAT: An integrated power, area, and timing mod-eling framework for multicore and manycore architectures,” in Proc.Annu. Int. Symp. Microarchit. (MICRO-42), New York, NY, USA, 2009,pp. 469–480.

[21] 4 GB: x16, x32 Mobile LPDDR2 SDRAM S4 Features, Micron Technol.Inc., Boise, ID, USA, 2011.

[22] Calculating Memory System Power for DDR3, Micron Technol. Inc.,Boise, ID, USA, 2007.

[23] S. Thoziyoor, J. H. Ahn, M. Monchiero, J. B. Brockman, andN. P. Jouppi, “A comprehensive memory modeling tool and its applica-tion to the design and analysis of future memory hierarchies,” in Proc.Int. Symp. Comput. Archit. (ISCA), Beijing, China, 2008, pp. 51–62.

[24] B. B. Yao, M. T. Ozsu, and N. Khandelwal, “XBench bench-mark and performance testing of XML DBMSs,” in Proc. Int. Conf.Data Eng. (ICDE), Boston, MA, USA, 2004, pp. 621–632.

[25] T. Austin, E. Larson, and D. Ernst, “SimpleScalar: An infrastructure forcomputer system modeling,” IEEE Comput., vol. 35, no. 2, pp. 59–67,Feb. 2002.

[26] NanGate. (Mar. 3, 2008). NanGate 45nm Open Cell Library. [Online].Available: http://www.nangate.com/, accessed 2012.

[27] Y.-J. An, K. Ryu, D.-H. Jung, S.-H. Woo, and S.-O. Jung, “An energyefficient time-domain temperature sensor for low-power on-chip thermalmanagement,” IEEE Sensors J., vol. 14, no. 1, pp. 104–110, Jan. 2014.

[28] K. Souri, Y. Chae, and K. A. A. Makinwa, “A CMOS temperature sensorwith a voltage-calibrated inaccuracy of ±0.15 ◦C (3σ ) from −55 ◦Cto 125 ◦C,” IEEE J. Solid-State Circuits, vol. 48, no. 1, pp. 292–301,Jan. 2013.

[29] Y. Ren, C. Wang, and H. Hong, “An all CMOS temperature sensor forthermal monitoring of VLSI circuits,” in Proc. IEEE Circuits Syst. Int.Conf. Test. Diagn. (CAS-ICTD), Chengdu, China, 2009, pp. 1–5.

[30] K. Souri and K. A. A. Makinwa, “A 0.12 mm2 7.4 uW micropowertemperature sensor with an inaccuracy of ±0.2 ◦C (3σ ) from −30 ◦Cto 125 ◦C,” IEEE J. Solid-State Circuits, vol. 46, no. 7, pp. 1693–1700,May 2011.

[31] L. L. Lewyn, T. Ytterdal, C. Wulff, and K. Martin, “Analog circuitdesign in nanoscale CMOS technologies,” Proc. IEEE, vol. 97, no. 10,pp. 1687–1714, Oct. 2009.

Jaeil Lim received the B.S. degree in electricaland electronic engineering from Yonsei University,Seoul, Korea, in 2010, where he is currently pursu-ing the combined Ph.D. degree with the Departmentof Electrical and Electronic Engineering.

His current research interests include DRAMrefresh management, low-power design, reliability,and very large scale integration design.

Hyunyul Lim received the B.S. degree in electricaland electronic engineering from Yonsei University,Seoul, Korea, in 2013, where he is currently pursu-ing the combined Ph.D. degree with the Departmentof Electrical and Electronic Engineering.

His current research interests include low-powerscan testing, delay scan test design, reliability, andvery large scale integration design.

Sungho Kang (M’89) received the B.S. degreefrom Seoul National University, Seoul, Korea,and the M.S. and Ph.D. degrees in electrical andcomputer engineering from the University of Texasat Austin, Austin, TX, USA, in 1992.

He was a Research Scientist with theSchlumberger Laboratory for Computer Science,Schlumberger, Inc., Austin, and a Senior StaffEngineer with the Semiconductor Systems DesignTechnology, Motorola, Inc., Austin. Since 1994,he has been a Professor with the Department of

Electrical and Electronic Engineering, Yonsei University, Seoul. His currentresearch interests include very large scale integration/SoC design and testing,design for testability, and design for manufacturability.