16. circuit pitfalls - cerc.utexas.edujaa/lectures/16-2.pdf · series resistance of d driver, ......
TRANSCRIPT
VLSI Design, Fall 201716. Circuit Pitfalls 1
16. Circuit Pitfalls
Jacob Abraham
Department of Electrical and Computer EngineeringThe University of Texas at Austin
VLSI DesignFall 2017
October 30, 2017
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 1 / 43
Bad Circuit 1
Circuit
2:1 multiplexer
Symptom
Mux works whenselected D is 0 but not 1Or fails at low VDDOr fails in SFSF corner
Principle: Threshold drop
X never rises above VDD − VtVt is raised by the body effectThe threshold drop is most serious as Vt becomes a greaterfraction of VDD
Solution: Use transmission gates, not pass transistors
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 1 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 2
Bad Circuit 2Circuit
Latch
Symptom
Load a 0 into QSet φ = 0Eventually Qspontaneously flips to 1
Principle: Leakage
X is a dynamic node holding a value as charge on the nodeEventually, subthreshold leakage may disturb charge
Solution: Staticize node withfeedback
Or, periodically refresh node (thisrequires a fast clock, and is notpractical for processes with bigleakage)
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 2 / 43
Bad Circuit 3Circuit
Domino AND gate
Symptom
Precharge gate (Y = 0)Then evaluateEventually Yspontaneously flips to 1
Principle: Leakage
X is a dynamic node holdingvalue as charge on the nodeEventually subthreshold leakagemay disturb charge
Solution: Keeper
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 3 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 3
Bad Circuit 4
Circuit
Pseudo-nMOS OR
Symptom
When only one input istrue, Y = 0Perhaps only happensin SF corner
Principle: Ratio Failure
nMOS and pMOS fight each otherIf the pMOS is too strong, nMOS cannot pull X low enough
Solution: Check that ratio is satisfied in all corners
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 4 / 43
Bad Circuit 5Circuit
LatchSymptom
Q stuck at 1May only happen forcertain latches whereinput is driven by asmall gate located faraway
Principle: Ratio failure (again)
Series resistance of D driver, wire resistance, and transmissiongate gate must be much less than weak feedback inverter
Solution: Check relative strengths
Avoid unbuffered diffusion inputswhere driver is unknown
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 5 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 4
Bad Circuit 6
Circuit
Domino AND gate
Symptom
Precharge gate while A= B = 0, so Z = 0Set φ= 1A risesZ is observed tosometimes rise
Principle: Charge sharing
If X was low, it shares chargewith Y
Solution: Limit charge sharing
Safe if CY >> CXOr, precharge node X too VX = VY =
CY
CX + CYVDD
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 6 / 43
Bad Circuit 7
Circuit
Dynamic gate + latch
Symptom
Precharge gate whiletransmission gate latchis opaqueEvaluateWhen latch becomestransparent, X falls
Principle: Charge sharing
If Y was low, it shares charge with X
Solution: Buffer dynamic nodes before driving transmissiongate
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 7 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 5
Bad Circuit 8
Circuit
Latch
Symptom
Q changes while latchis opaqueEspecially if D comesfrom a far-away driver
Principle: Diffusion Input Noise Sensitivity
If VD < −Vt, transmission gate turns onMost likely because of power supply noise or coupling on D
Solution: Buffer D locally
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 8 / 43
Bad Circuit 9
Circuit
Anything
Symptom
Some gates are slowerthan expected
Principle: Hot Spots and Power Supply Noise
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 9 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 6
NoiseSources
Power supply noise/ground bounceCapacitive couplingSubstrate couplingCharge sharingLeakageNoise feedthrough
ConsequencesIncreased delay (for noise to settle out)Or incorrect computations
Source: electronicproducts.comLine-to-substrate coupling
Visualization of substrate noise
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 10 / 43
Electromigration
“Electron wind” causes movement of metal atoms along wires
Excessive electromigration leads to open circuitsMost significant for unidirectional currents (DC)
Depends on current density Jdc (current/area)Exponential dependence on temperatureBlack’s Equation:
MTTF ∝ eEakT
Jndc,
where Ea is the activation energy (empirically determined bystress testing at high temperatures), and n is typically 2Typical limits: Jdc < 1− 2 mA/µm2
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 11 / 43
Source: Cheung and Tao, UC Berkeley
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 7
Self Heating
Current through wire resistance generates heatOxide surrounding wires is a thermal insulatorHeat tends to build up in wiresHotter wires are more resistive, slower
Self-heating limits AC current densities for reliability
Irms =
√∫ T0 I(t)2dt
T
Typical limits: Jrms < 15 mA/µm2
Self heating a problem for SOI circuits and 3-D systems
Modeling self heating, Silvaco
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 12 / 43
Latchup
Latchup: positive feedback leading to VDD – GND short
Major problem for 1970s CMOS processes before it was wellunderstood
Avoid by minimizing resistance of body to GND/VDD
Use plenty of substrate and well taps
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 13 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 8
Guard Rings
Latchup risk greatest when diffusion-to-substrate diodes couldbecome forward-biased
Surround sensitive region with guard ring to collect injectedcharge
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 14 / 43
Overvoltage
High voltages can damage transistorsElectrostatic discharge (ESD)Oxide arcingPunchthroughTime-dependent dielectric breakdown (TDDB)
Accumulated wear from tunneling currents
Requires low VDD for thin oxides and short channelsUse ESD protection structures where chip meets real world
Transient suppression device specifications
for automotive applications
Source: dev.emcelettronica.com
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 15 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 9
Hot Carriers
Electric fields across channel impart high energies to somecarriers
These “hot” carriers may be blasted into the gate oxide wherethey become trappedAccumulation of charge in oxide causes shift in Vt over timeEventually Vt shifts too far for devices to operate correctly
Choose VDD to achieve reasonable product lifetimeWorst problems for inverters and NORs with slow input risetime and long propagation delays
Source: Kiethley Application Note 2535
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 16 / 43
Bias Temperature Instability
Mechanism
Even when no carriers moving from source to drain, the gatevoltage can cause charges to migrate into the insulating gateoxide
Phenomenon is partly reversible
Charges leave the oxide after the gate voltage is removed
Source: Keane and Kim, IEEE Spectrum, May 2011
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 17 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 10
NBTI Mechanism and Degradation
Hole interaction with oxide causes Si-Hbonds to break
Change is positive charge density in trapsincreases Vt
When stress is removed, H atoms diffuseback to interface and anneal the brokenbond
DC stress causes much shorter lifetime
Source: Peters, Semiconductor International, March 1, 2004ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 18 / 43
Transistor Aging and Failure Prediction
Guardband violation due to transistor aging
Example of an aging sensor
Agarwal et al., VTS 2007ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 19 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 11
Oxide Breakdown
Mechanism
Voltage across the gate can also cause electrically activedefects within the oxide layer
The defects can trap charges
If enough of the charges accumulate, they can create a short,causing a catastrophic failure of the transistor
Source: Keane and Kim, IEEE Spectrum, May 2011
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 20 / 43
Summary
Static CMOS gates are very robust
Will settle to correct value if designed with sufficient marginsand if you wait long enough
Other circuits suffer from a variety of pitfalls
Tradeoff between performance and robustness
Very important to check circuits for pitfalls
For large chips, you need an automatic checkerDesign rules aren’t worth the paper they are printed on unlessyou back them up with a tool
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 21 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 12
Classes of Dependable Systems
Systems Designed for Very Long LifeSpacecraft with multiyear missions, inaccessible systemsTechniques: Replication (spares), error coding, monitoring,shielding
Safety-Critical SystemsFlight control computers, nuclear-plant shutdown, medicalmonitoring, automobile braking controlTechniques: Replication with voting, time redundancy, designdiversity
High-Availability SystemsTelephone switching centers, server farms, banking systems,e-commerceTechniques: Hardware and Information redundancy, backupschemes, hot-swap, recovery
Consumer Products?PCs, PDAs, smart phonesTechniques: parity checks for memories, intrusion tolerance,virus detection, low cost of replacement
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 22 / 43
Historical Perspective
Dionysius Lardner
“The most certain and effectual check upon errors which arisein the process of computation, is to cause the samecomputations to be made by separate and independentcomputers; and this check is rendered still more decisive ifthey make their computations by different methods,”Edinburgh Review, No. CXX, July 1834
Key Papers in 1956
Moore and Shannon, “Reliable circuits using less reliablerelays,” Bell System Technical Journal
von Neumann, “Probabilistic logic and synthesis of reliableorganism from unreliable components,” Annals ofmathematical studies, Princeton University Press
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 23 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 13
Reliable Relay Networks
Shannon, 1956
Can be applied to MOS transistors
A single open or short of a transistor (relay) will be masked bythe network
Many multiple faults will also be tolerated
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 24 / 43
NAND Multiplexing – Massive Redundancy
Inspired by 1956 von Neumann paper for logic implemented innanotechnologies
Approach
Similar to NMR, but voting carried out in a bundle
Executive stage – performs operations
Restorative stage – reduces degradation caused by errors fromthe executive stage, acting as output “amplifier”
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 25 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 14
Error-Detecting and Correcting Codes
One way of detecting (and correcting) errors in data transmissionand storage, is to encode data, with a subset of the words beingcode words
Reasonable errors will change a code word to a non-code word, andthe errors will be detectable
Errors which transform one code word into another will not bedetectable
“Error Models” relate likely physical faults to the errors that theycould cause
Distance between two code words is the number of distinctchanges needed to change one code word into the other
Example: Parity codes
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 26 / 43
Distance Properties
The Hamming Weight of a vector, X, W (X) is the number ofnon-zero components of X
The Hamming Distance between two vectors, X and Y , d(X,Y ),is the number of components in which they differ
The minimum distance of a code is the minimum of Hammingdistances between all pairs of code words
To detect d-bit errors, need a code with distance d+ 1, to correctd-bit errors, need a code with distance 2d+ 1.Example:
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 27 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 15
Self-Checking Circuits
Self-Checking Circuits – encoded inputs and outputs, outputchecker
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 28 / 43
Boeing 777 Primary Flight Computer
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 29 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 16
Reliability, Availability, Safety
Reliability (R(t))Conditional probability that a system provides continuousproper service in the interval [0,t] given that it provided desiredservice at time 0Simple Reliability function (exponential): R(t) = e−λt,Constant Failure Rate λ
Mean Time to Failure, MTTFMTTF =
∫∞0R(t)dt
For an exponential reliability function, MTTF = 1/λ
Availability A(t)Fraction of time that system is in the operational state(providing service) during the interval [0,t]Function of both failure rate (λ) and repair rate (µ)Steady-State Availability, A = MTTF
MTTF+MTTR = λλ+µ
“Markov Chain” for a simplesystem with repair
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 30 / 43
Reliability
View a system as providing a service
Faults =⇒ Errors =⇒ Failures
Fault: an anomalous physical condition
Error: an incorrect logic value as a consequence of the fault
Failure: the condition where the system does not provide theexpected service
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 31 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 17
Are “Fault Tolerance” and “Resilience” the Same?
Fault Tolerance
Errors (due to faults)detected and corrected, faultlocated, reconfigurationaround faulty unit
System designed to tolerateclasses of faults
User does not see anythingwrong (except perhaps anadditional delay)
Service does not suffer anydown time
Resilience
User may see errors duringthe service, but the finalresults are correct
System requires on-line errordetection, but may usecheckpoints, retry, etc., toachieve resilience
Ability to deal with“unknown” faults
Service may be downintermittently
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 32 / 43
Errors Not Repeatable – “Heisenbugs”
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 33 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 18
Example of Resilience – Single Engine Airplane
From Malibu Jetprop pilot’s operating handbook
If loss of power occurs at altitude, trim the aircraft for bestgliding angle (90 KIAS) and look for a suitable field.
At best glide angle, no wind, with the engine stopped and thepropeller feathered, the aircraft will travel approximately 2miles for each thousand feet of altitude.
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 34 / 43
Achieving Resilience
Start with fault-free hardware
Testing after manufacturing
On-line tests to detect wearout and degradation
Detection is key
Detect errors in results of computations
Application-level results are, ultimately, what are important
Ensure correct results at the application level
Appropriate checks at different levels of the design
High-level checks tend to have lower overheads
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 35 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 19
Application-Level Fault Tolerance
Reduce the cost of fault tolerance by looking at computations at ahigher level
Algorithm-Based Fault Tolerance (ABFT), (Huang and Abraham,1984)
Encode data at a high level (application level)
Design algorithm to operate on encoded input data andproduce encoded output data
Distribute computation tasks among multiple computationunits, so that failure of a unit affects only a portion of theoutput data, enabling the correct data to be recovered
Very general fault model: A computation unit can produce anyarbitrary logical output under failure
Communication paths checked using coding techniques
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 36 / 43
Illustration of Application to Matrix Operations
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 37 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 20
Checksum Calculations in High Performance Computing
ABFT applied to DGEMM
Source: Bosilca, Delmas, Dongarra and Langou, 2008.
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 38 / 43
Performance Under Failure
Performance (GFLOPS/sec/proc) of PBLAS PDGEMM, ABFTBLAS PDGEMM (0 failure), and ABFT BLAS PDGEMM (1failure)
4 25 100 225 400 625 10240
0.5
1
1.5
2
2.5
3
3.5
4
4.4nloc=3000
GF
LOP
S/p
roc
# procs
PBLAS PDGEMMABFTBLAS PDGEMM (0 failure)ABFTBLAS PDGEMM (1 failure)
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 39 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 21
Error Resilience in Non-Linear Control Systems
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 40 / 43
Brake by Wire
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 41 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017
VLSI Design, Fall 201716. Circuit Pitfalls 22
Error Detection and Correction in Non-Linear ControlSystem
Brake by Wire algorithm executing on embedded processor
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 42 / 43
Error Detection and Correction in Brake by Wire System
Approach to dealing with soft errors
When transient error is detected, results of the control loop(output to actuator) ignored for a few cycles till no error isseen.
ECE Department, University of Texas at Austin Lecture 16. Circuit Pitfalls Jacob Abraham, October 30, 2017 43 / 43
Department of Electrical and Computer Engineering, The University of Texas at AustinJ. A. Abraham, October 30, 2017