circuit reliability: mechanisms, monitors, and effects in...
TRANSCRIPT
Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold
Processors
Chris H. Kim
University of Minnesota, Minneapolis, MN
[email protected]/~chriskim/
2
Scaling Challenges
Power wall Reliability wallVariability wall
2000 2010 2020
Year
Pow
er (W
)
3
Overcoming the Power Wall
• Proven solutions: Multi-core chips, dynamic voltage frequency scaling, clock gating, power gating, …
Y=AxBY=AxBY=AxB
Freq=1Vdd=1Throughput=1Area=1Power=1Pwr Den=1
Freq=0.5Vdd=0.5Throughput=1Area=2Power=0.25Pwr Den=0.125
87%↓
4
Overcoming the Variability WallVID
6+
-DAC+
-
PLimit
VDie
IIR
PCalc Calc
VConnectorA/D
A/D
Power SupplyMicro-Controller
RPackage
Package/Die
VConnectorVDie
RPackage
Intel Foxton Technology
• Proven solutions: Variation aware design, memory assist/repair, lithography techniques, adaptive systems
5
Overcoming the Reliability Wall
• Possible solutions: Guardbanding, sensing and compensation, wear-leveling, failure resistant systems, …
6
Outline• Device Reliability Issues
• Reliability Monitors and Measurements
• Reliability Effects in NTV Processors
• Summary
Aging in CMOS Transistors
7
8
• Transistors are exposed to different stress conditions during normal digital circuit operation
HCI, BTI, and TDDB in Digital Logic
D Inverted Channel
ID Inverted Channel
Practical Solutions for Preventing Aging Related Failures
• BTI and HCI– Gradual decline in performance– Guard banding (static or dynamic), adjust Vmax– CAD, firmware & architecture level support essential
• TDDB– Single incident may lead to outright system failure– Can happen anywhere inside a chip– Improve fabrication procedure, adjust Vmax
• Bottom line: Precise measurement and understanding of circuit degradation a key aspect of robust design
9
Transistor Lifetime Estimation
• Extrapolate stress results with respect to:– Op. conditions based on acceleration models– Larger chip areas (e.g., Poisson scaling for TDDB)– Lower percentiles based on chosen distribution
10
real
sup
ply
volta
ge
Benefits of In-Situ Reliability Monitors over Device Probing
11
• Information from actual circuits (test circuit must be representative)
• High (timing) precision + short measurement interrupt
• No expensive equipment• Short test time and reduced test area• Measurements at use condition
allows realistic lifetime projection• Complements traditional probing
methods
Usage Scenarios and Design Issues of In-situ Reliability Monitors
• Usage scenario 1: Process characterization and yield improvement• Early technology characterization is often performed
before many metallization layers are being fabricated• Library cells may not be available (flip-flops, scan)• Device probing would still be a competitive solution
for extracting analog parameters such as I–V or C–V• Usage scenario 2: In-field monitoring and data
collection• Workload unknown• Simple circuits are practical but they have limited
capabilities• Firmware and architecture support needed
12
Usage Scenarios and Design Issues of In-situ Reliability Monitors
• Usage scenario 3: Sensor for real time aging compensation• Effectiveness versus overhead• Measurements are from a proxy circuit• Practical issues: type of sensor, temporal granularity,
spatial granularity, communication with sensors, interface and protocol
• Personally not a big fan
13
14
Outline• Device Reliability Issues
• Monitors and Measurements
• Effects in NTV Processors
• Summary
Circuit Based Reliability Monitors (or Silicon Odometers)
15
Die Photo
Odometer Projects
Original Silicon
Odometer
Focused Reliability
Issues
Year 2007 2008 2009 2010 2011
130nm 65nm 65nm 65nm 32nmSOI
All-In-One Odometer
Process
Statistical, Duty-Cycle,
and RTN Odometer
Interconnect Odometer
PBTI and SRAM
Odometer
32nmSOI
SRAM and RTN
Odometer
2012
NBTI Induced
Frequency Degradation
Separately Monitoring
NBTI, HCI and TDDB
Statistical Behavior of
NBTI;RTN on Logic
Circuit
Impact of Interconnect on BTI and HCI Aging
Monitoring PBTI in HKMG
Process;BTI Impact on SRAM Read/
Write
SRAM Timing Issues Due to
BTI;RTN Impact
on Ring Oscillator
Beat Frequency Silicon Odometer
• Beat frequency of two free running ROSCs measured by DFF and edge detector
• Benefits of beat frequency detection system– Achieve ps resolution with μs measurement interrupt– Insensitive to common mode noise such as
temperature drifts– Fully digital, scan based interface, easy to implement
16
Beat Frequency Silicon Odometer
ref
stressbeat ref stress
stress
ref
• Sample stressed ROSC output with reference ROSC– 1% frequency difference before stress N=100– 2% frequency difference after stress N=50– Δf or ΔT sensing resolution is >0.01%
17
18
ROSC Based Aging Sensor Comparison
Block Diagram
FunctionCount Stress ROSC
periods during externally controlled meas. time
Count Stress ROSC periods during N1 periods
of Ref. ROSC
Count Ref. ROSC periods during one period of
PC_OUT
Features Simple; compact Simple; immune to common mode variations
High resolution w/ short meas. time; immune to
common mode variations
Issues
Voltage and temp. varations; meas. time vs.
resolution tradeoff; requires absolute timing reference
(e.g. oscilloscope)
Meas. time vs. resolution tradeoff
Requires extra circuits (e.g., Phase Comp., edge
detector, etc...)
Meas. time for 0.01% max res. *
30 μs 30 μs
Meas. error wrt. common
mode variations **
+10.18% / -8.57% +0.06% / -0.07%+0.26% / -0.38%
*ROSC period = 3 ns ** simulated with +/- 4% ∆VCC
0.3 μs
System Single ROSC 2 ROSC, simple 2 ROSC, beat freq.
Separately Monitoring NBTI and PBTI
• PBTI becoming an important concern in high-k metal-gate
• Conventional Ring Oscillator (ROSC) can only provide overall frequency degradation information due to combined NBTI and PBTI effects
• New RO structure separates NBTI and PBTI effects
19
PBTI stress
NBTI stress
N/PBTI stressJ. Kim, et al., IBM, IRPS 2011
Separately Monitoring BTI and HCI
20
BTI_ROSC(BTI Stress Only)
DRIVE_ROSC(BTI & HCI Stress)
BTI_REF_ROSC
DRIVE_REF_ROSC
Beat Frequency Detection Circuit 1
SCANOUT
STRESSED
UNSTRESSED
SCANOUT Beat Frequency
Detection Circuit 2
BTI_ROSC
DRIVE_ROSC
Separately Monitoring BTI and HCI
• Backdriving action equalizes BTI in both BTI_ROSC and DRIVE_ROSC
• Negligible HCI in BTI_ROSC: only 3-5% of the switching current in the DRIVE_ROSC
• Fresh power gates are used for frequency measurements21
Temp. and Voltage Dependencies
• HCI slightly reduced with temperature– Due to reduced drain current
• Both mechanisms degrade with stress voltage– Point when HCI begins to dominate pushed out in
time by >1 order of magnitude at 1.8V vs. 2.4V
1.E-02
1.E-01
1.E+00
1.E+0 1.E+1 1.E+2 1.E+3 1.E+4 1.E+5
Freq
uenc
y Shi
ft (%
)
Stress Time (s)
250MHz stress freq.2.0V stress
30OC: HCIDEG , BTIDEG
120OC: HCIDEG , BTIDEG
1.E-02
1.E-01
1.E+00
1.E+01
1.E+00 1.E+02 1.E+04 1.E+06
Freq
uenc
y Sh
ift (%
)
Stress Time (s)
26OC470MHz stress freq.
2.4V stress
1.8V stress
22
Aging Issues in Interconnects
• Interconnect affects the voltage and current shapes– Increased transition time (decreased slew rate)– Increased current pulse; decreased current peak value
• BTI and HCI have different sensitivities to bias conditions
23
Interconnect Aging Monitor
24
• Serpentine wires for a dense chip implementation• Ground shielding on both sides for reducing noise
X. Wang, et al., IRPS 2012, TVLSI 2014
• BTI aging decreases with interconnect length• HCI degradation peaks at L=500µm
BTI and HCI Aging: With Interconnect
25
BTI Aging vs. Interconnect Length
• BTI induced frequency degradation decreases with longer interconnect
• Longer transition time shorter PMOS stress duration Less BTI aging
26
HCI Aging vs. Interconnect Length
• HCI aging exhibits a non-monotonic behavior with respect to interconnect length– Current pulse width increases – Current peak decreases
27
Statistical Behavior of Aging
• Finite number and random spatial distribution of discrete charges NBTI & HCI variation
• Inversely proportional to AGATE worse with scaling• Small number of aging measurements not sufficient to
characterize aging28
Spread in ∆Vt increases with scaling CDF of ∆Vt at different stress timesS. Pae, et al., TDMR‘08 S. Rauch, TDMR, Dec. ‘07
Statistical Reliability Monitor• Need stressed &
reference ROSC frequencies to be close
• Difficult, costly to tune each stressed ROSC
• Use multiple ref. ROSCs with different frequencies
• Cover the frequency distribution of the stressed array
SCANOUTRESULTS
FSM+
ScanChain
Column Peripherals
Ref ROSC 3
Ref ROSC 1
Ref ROSC 2
3 Silicon Odometer Beat
Frequency Detection Systems
J. Keane, et al., IEDM 2010, JSSC 201129
65nm Test Chip Data
• Fresh and post-stress ROSC frequency PDFs• No significant correlation of the frequency shift
with fresh frequency
Perc
enta
ge o
f RO
SCs
245 254 263 272DUT Frequency (MHz)
Fresh3.1hr Stress1.8V
2.0V
2.2V
30
0.011
-0.126
-0.112
20OC, DC; 2.2V, 2.0V, & 1.8V120 ROSCs each @ 11200s
Correl. Coef.
SRAM Memory Design Challenges at Low Supply Voltages
• Ratio-ed operation leads to poor noise margin at low voltages for 6T SRAM cells
• Conflicting requirements: a stronger access transistor improves write margin but worsens read margin
BLBB
L
31
• With BTI: Read stability degrades
• Cell recovers on a fail
32
Impact of BTI on SRAM Read
Bit
Failu
re R
ate
(BFR
, %)
33
Impact of BTI on SRAM Write
• With BTI: Write stability improves or remains unchanged
• Cell recovers on a pass
Bit
Failu
re R
ate
(BFR
, %)
34
Representative SRAM Reliability Macro
256x128b SRAM array
Deco
der
FSM
VMEAS, VSTRESS
128b scan reg.
Col. peripheralVCO
Peripheral supply
Off-chip DAQ
CLK
Array supply domain
Peripheral supply domain
WL
BL
Supply switches
• Represents a product SRAM sub-array• BIST function done by on-chip FSM with supply switches
P. Jain, et al., IEDM, 2012
0.1
1
100
Read
Fai
lure
Rat
e (%
)TSTRESS (s)
1 10 100 1000
10
TMEAS increased from 3µs to 2ms
32nm, 0.52V, 85°C
10x
35
• Implemented on IBM’s z196 Enterprise systems for long term degradation under real-use conditions.
• Over 500 days worth of ring oscillator degradation data from customer systems
• Other companies have aging monitors too, but they tend not to publish their work
Aging Monitor in IBM MicroprocessorsPongfei Flu, Keith Jenkins, IBM, IRPS 2013
36
• Time-zero problem: Some time will elapse between applying voltage (burn-in, test, operation) and making the first measurement time-zero frequency is completely unknown incorrect time slope of 0.42
• Use fitting parameters assuming Δf = A(t-to)n-Atn time slope of 0.172
Aging Monitor in IBM MicroprocessorsPongfei Flu, Keith Jenkins, IBM, IRPS 2013
37
Design Considerations Examples of Practical Issues
BTI, HCI, TDDB, RTN, transient errors, memory bit failures, etc.Type of Sensor
Temporal Granularity Sensing period, threshold setting, dynamic range, etc.
Spatial Granularity Per CPU/GPU/memory, per functional unit, per sub-block, etc.
Stress and Measurement
ConditionAC vs. DC, accelerated vs. usage
condition, fast measurement
CommunicationBetween data gathering sensor,
across sensors, between sensors and processor
Interface and ProtocolInterrupt based, polling, event
alarms, performance counter based, etc.
Aging Sensor Implementation in IBM z196 Server [3]
Ring Oscillator based BTI monitor for long-term frequency degradation measurement
Sampling period: once a week
Total: 5 sensors per chip; One sensor per core (x4 cores) plus one sensor in L2 cache
AC stress, usage condition, 0.5ms measurement time
Sensors are integrated with IBM z196 pervasive infrastructure with firmware
support
Interrupt based in-field frequency degradation measurement
Testing and Calibration
Similar to any other on-chip monitor circuit
Time 0 frequency shift unknown since first sample is taken after some stress
Aging Monitor in IBM MicroprocessorsPongfei Flu, Keith Jenkins, IBM, IRPS 2013
Outline• Device Reliability Issues
• Monitors and Measurements
• Effects in NTV Processors
• Summary
38
DVFS Systems in ISSCC 2014
22nm Intel Haswell processor N. Kurd, et al., ISSCC, 2014
• Latest trends: On-chip distributed VRM (fast transients, supply noise suppression), per-core DVS, NTV/Turbo
22nm IBM POWER8 processorZ. Toprak-Deniz, et al., ISSCC, 2014
<1% area overhead
39
40
Frequency Fluctuation in DVFS(BTI Example)
Time
V DD
Freq
uenc
y
Time
V DD
Freq
uenc
y
Time
V DD
Freq
uenc
y
• Constant VDD: Frequency degrades with stress • High VDD to low VDD: Freq. dips due to lower VDD followed
by recovery• Low VDD to high VDD: Freq. jumps and then degrades
Freq. dip
Freq. peak
41
Time
V DD
Freq
uenc
y
Time
V DD
Freq
uenc
y
Time
V DD
Freq
uenc
y
fCLK
GuardbandfCLK
Guardband
fCLK
Guardband
• Constant VDD: Frequency degrades with stress • High VDD to low VDD: Freq. dips due to lower VDD followed
by recovery• Low VDD to high VDD: Freq. jumps and then degrades
Frequency Fluctuation in DVFS(BTI Example)
42
Modeling Approach using Superposition• Rationale for empirical
superposition method– Complicated VDD trace
can be broken down into multiple pulses
– Suitable for long-and short-term, DC and AC
– Computation is more efficient, short runtime
Δf(%
)
ΔV T
ΔV T
3Δ
V T2
ΔV T
1V D
D
V1
V2V3
V1
ΔVT1
V2ΔVT2
V3ΔVT3
ΔVT=ΔVT1+ΔVT2+ΔVT3
Time (a.u.)
Not captured in previous work
Time (a.u.)
C. Zhou, et al., IRPS 2014
43
BTI Recovery Model using Superposition
• Stress model: tn (power law)• Recovery model derived from superposition property:
ΔVT,recovery(t) = tn-(t-t0)n
44
Translating VT Shift to Delay Shift
• ROSC mimics logic path• Translate ΔVT to pull-up,
pull-down delayPull-down Delay Pull-up Delay
0.6 0.7 0.8 0.9 1.00
0.005
0.010
0.015
0.020
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Dealy
(normalized)Δ
V T (n
orm
aliz
ed)
VDD (normalized)
0.6 0.7 0.8 0.9 10
0.005
0.010
0.015
0.020
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1HSPICE, TT, 60°C
Dealy
(normalized)Δ
V T (n
orm
aliz
ed)
VDD (normalized)
HSPICE, TT, 60°C
45
Android Development Board for Collecting DVFS Traces
• VDD and operating frequency collected in real time• Navigating websites, running benchmark applications
Linux kernel v 3.4.5Operating system Android v 4.2.2
Processor ARM Cortex A15
System Samsung Exynos 5410 SoC
Frequency 0.8 – 1.8 GHzVoltage 0.9 – 1.25 V
DVFS meas. National Instr. DAQ
Sampling frequency
1000 samples per second
Process 28nm
46
Sample Waveform and Estimated Frequency Shift
0 1 2 3 4 5 6Time (s)
-0.8-0.6-0.4-0.2
0
Freq
uenc
y Sh
ift (%
)
-1.0-0.90%
high VDD stress
low VDD recovery
Amazon.com
0.60.8
11.2
V DD
(nor
mal
ized
)
• High VDD duration: Freq. degrades with time • Low VDD duration: Freq. shift dips and then recovers
47
Applying Model to Other DVFS Traces
• Worst case frequency dip– 3D-raytrace: Δf=1.0% at t=6s when VDD drops by 29%
after staying in high VDD mode for 5.8s
Sina.com
Google.comNYTimes.comAmazon.com
48
Summary• Power wall (2000) Variability wall (2010)
Reliability wall (2020) • Example: NTV + RDF + BTI
• Aging sensor deployed for the first time in a commercial processor (IBM z systems)
• Per-Core DVFS with sub-microsecond ramp time becoming a standard feature in new processors
• Turbo boost + NTV: Best of both worlds in terms of power and performance, but presents new reliability challenges