january 4-8, 2008vlsi design 20081 single event upset an embedded tutorial fan wang vishwani d....

22
January 4-8, 2008 VLSI Design 2008 1 Single Event Upset Single Event Upset An Embedded Tutorial An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University, AL 36849 USA 21 th International Conf. on VLSI Design, Hyderabad, India, January 4-8, 2008

Upload: keeley-leeming

Post on 15-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 1

Single Event UpsetSingle Event UpsetAn Embedded TutorialAn Embedded Tutorial

Fan WangVishwani D. Agrawal

Department of Electrical and Computer EngineeringAuburn University, AL 36849 USA

21th International Conf. on VLSI Design, Hyderabad, India, January 4-8, 2008

Page 2: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 2

Motivation for This Work

With the continuous downscaling of CMOS technologies, the device reliability has become a major bottleneck.

The sensitivity of electronic systems can potentially become a major cause of soft (non-permanent) failures.

It is necessary for both circuit designer and test engineer to have the basic knowledge of soft errors caused by the basic radiation mechanisms, and the soft error mitigation techniques.

Page 3: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 3

Outline

Introduction to Soft ErrorsWhat is Soft Error?Historical notes

Basic radiation mechanisms in siliconSoft error resilience techniquesA case studyConclusion

Page 4: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 4

Introduction to SEUCertain behaviors in the state of the art

electronic circuits caused by random factors.

Single event upset (SEU) is non-permanent, non-functional error.

Definition from NASA Thesaurus: “Single Event Upset (SEU): Radiation-induced errors

in microelectronic circuits caused when charged particles (usually from the radiation belts or from cosmic rays) lose energy by ionizing the medium through which they pass, leaving behind a wake of electron-hole pairs”.

Page 5: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 5

What is Soft Error A “fault” is the cause of errors. A non-permanent fault is a non-destructive fault and

falls into two categories: Transient faults, caused by environmental conditions like

temperature, humidity, pressure, voltage, power supply, vibrations, fluctuations, electromagnetic interference, ground loops, cosmic rays and alpha particles.

Intermittent faults caused by non-environmental conditions like loose connections, aging components, critical timing, resistive or capacitive variations and noise in the system.

With advances in manufacturing, “soft error” caused by cosmic rays and alpha particles are dominant causes of failures in electronic systems.

Page 6: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 6

Historical Notes In the period 1954 through 1957 failures in digital electronics were

reported during the above-ground nuclear bomb tests. In 1962, Wallmark and Marcus predicted that cosmic rays would start

upsetting microcircuits due to heavy ionized particle strikes when feature sizes become small enough.

In 1970s and early 1980s, the effects of radiation received attention and more researchers examined the physics of these phenomena. Same as the fault tolerant computing theory.

In 1978, May and Woods of Intel Corporation determined that these errors were caused by the alpha particles emitted in the radioactive decay of uranium and thorium present just in few parts-per-million levels in package materials.

In 1979, Guenzer and Wolicki reported that the error causing particles came not only from uranium and thorium but that nuclear reactions generated high energy neutrons and protons. The term “SEU” has been in use since this paper.

In 1979, Ziegler and Lanford from IBM predicted that cosmic rays could result in the same upset phenomenon in electronics (not only memories) even at sea level.

Page 7: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 7

Soft Error Rate of Specific Applications Figure of Merit:

1. Fail In Time (FIT) 2. MTTF (Mean Time To Failure)

The number of failures per 109 device hours. 1 year MTTF = 109/(24*365) FIT = 114,155 FIT

SER of contemporary commercial chips is controlled to within 100~1000 FITs!!! Most hard failure mechanisms produce error rate on the order of 1~100 FIT Programmable Logic SER is almost 100 times larger than combinational logic

FPGA XC4010E XC4010XL

Process 0.60um 0.35um

Vcc 5v 3.3v

1 SEU every 1×106 hours 2.8×105 hours

M. Ohlsson, P. Dyreklev, K. Johansson and P. Alfke, “Neutron Single Event Upsets in SRAM-Based FPGAs”, proc. 1998 IEEE Nuclear & Space Radiation Effects Conference

Chuck Stroud, “FPGA Architectures and Operation for Tolerating SEUs”, Electrical Engineering VLSI design and test seminar, Spring 2007, Auburn University.

Soft Error Rate for SRAM-Based FPGAs:

Smaller design rule and lower supply voltagesUsed radiation chamber to calculate SEU frequency at altitude of 10km at 60°N (Sweden)

Projecting this for 3 design rule shrinks and 2 voltage reductions we get ≈1 SEU every 28.2 hrs

Page 8: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 8

Example: SRAM-Based FPGA System*

Table

cont.

*1. Example (1) is tested at Denver, using SpaceRad 4.5 (a software radiation effects prediction software program). Source: Actel.

2. All systems are without any protection.

Page 9: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 9

Radiation Mechanisms for Silicon (1)1. Alpha particles are emitted when the nucleus of an

unstable isotope decays to a lower energy state. (dominant soft error cause for DRAM in 1970s)

Uranium and thorium have the highest activity among naturally occurring radioactive materials.

In the terrestrial environment, major sources of radioactive impurities are lead-based isotopes in solder bumps of the flip-chip technology, gold used for the bond wires and lid plating, aluminum in ceramic packages, lead-frame alloys and interconnect metalization.

**With carefully selected materials, this mechanism effect can be greatly reduced.

Page 10: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 10

Radiation Mechanisms for Silicon (2)2. High-energy ( > 1 MeV*) neutrons from cosmic

radiation induces soft errors in semiconductor devices via secondary ions produced by the neutron reaction with silicon nuclei.

Cosmic rays which are of galactic origin react with the Earth’s atmosphere to produce complex cascades of secondary particles.

Neutrons are the most likely cosmic radiation sources to cause SEU in deep-submicron semiconductors at terrestrial altitude. The neutron flux is dependent on the altitude above sea level, the density of the neutron flux increases with altitude

**Nowadays, Neutron is the major cause among all fail mechanisms.

*MeV: Million Electron Volts

Page 11: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 11

Radiation Mechanisms for Silicon (3)3. The secondary radiation induced from the interaction of

cosmic ray neutrons and boron is the third significant source of ionizing particles in electronic systems.

Low-energy cosmic neutron interactions with the isotope boron-10 (10B). 10B is commonly used as p-type dopant for junction formation IC package.

**This mechanism can be greatly reduced or eliminated by removing source of 10B

Baumann et al, IEEE Trans. Device and Materials Reliability, vol. 1, no. 1, pp. 17–22, 2001.

Page 12: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 12

Single Event Transient (SET)

SET is caused by the generation of charge due to a high-energy particle passing through a sensitive node.

Each SET has its unique characteristics like polarity, waveform, amplitude, duration, etc. depend on particle impact location, particle energy, device technology, device supply voltage and output load.

The off transistors struck by a heavy ion with high enough LET* in the junction area are most sensitive to SEU.

Specifically, the channel region of the off-NMOS transistor and the drain region of the off-PMOS transistor.

*Linear Energy Transfer is a measure of the energy transferred to the device per unit length as an ionizing particle travels through a material.

Page 13: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 13

More Details of SET Generation

(a) Along the path traverses, the particle produces a dense radial distribution of electron-hole pairs.

(b) Outside the depletion region the non-equilibrium charge distribution induces a temporary funnel-shaped potential distortion along the trajectory of the event (drift component).

(c) Funnel collapses, diffusion component then dominates the collection process until all excess carriers have been collected, recombined, or diffused away from the junction area.

(d) Current vs. Time to illustrate the charge collection and SET generation.

Page 14: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 14

Analytical Model of SET The time constants depend strongly on the type of ion, its initial

energy and the properties of the specific technology. Approximate analytical model for ion track charge collection is a

double-exponential form. It gives an induced current with a rapid rise time but a more gradual fall time:

*Typical values are approximately

1.64 x 10-10sec for

and 5.10x10-11sec for .*Experimental Results from NASA JPL

Page 15: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 15

SET in CMOS Inverter

*For example, in ami12 technology, when the output load capacitance is 100fF and the cumulative collected charge is 0.65pC, the amplitude of the voltage pulse is 0.65pC/100fF = 0.65 x10-12C/100 x10-15F = 0.65V .

Page 16: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 16

Soft Error Mitigation Techniques The soft error tolerant techniques can be classified into

two types: recovery and prevention. Recovery: Recovery error after it does occur. Include on-line recovery mechanisms, fault tolerant computing,

ECC/parity check, redundancy etc. Prevention: The methods to protect microchips from soft-errors

before it occurs. The need for a recovery mechanism stems from the fact

that prevention techniques may not be enough for contemporary microchips.

Soft error is not the only reason why computer systems need to resort to a recovery procedure. Random errors due to noise, unreliable components, and coupling effects may also require the recovery mechanism.

Page 17: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 17

Some Mitigation Techniques Prevention Techniques

1. Purify the Fabrication Material: Uranium and thorium impurities have been reduced below one hundred

parts per trillion for high reliability. To eliminate 10B, alternative insulators that don’t contain boron are used.

2. Radiation Hardened Process Technologies SER performance can be greatly improved by adapting the process

technology either to reduce the collected charge or increase the critical charge.

Specific methods: use additional well isolation; replace bulk silicon with SOI.

10x reduction in SER achieved over conventional bulk devices when a fully depleted SOI substrate is used. But SOI is more expensive and parasitic bipolar action limit further reduction of SER.

Page 18: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 18

Picked Mitigation Techniques Recovery Techniques

1. Redundancy To gain higher system reliability by sacrificing the minimality of time or space or both. Classic design: Triple Modular Redundancy (TMR) with majority voter New design: time redundancy based on C-element gate to compare two samples

of combinational primary outputs at t0 and t0+d.

2. Error Detection and Correction Code (EDAC) Simple solution for memory: add a parity bit to each memory word. In most situations, it must be combined with a system-level approach for error

recovery.

*S. Mitra, Z. Ming, S. Waqas, N. Seifert, B. Gill, and K. S. Kim, “Combinational Logic Soft Error Correction,” in Proc. International Test Conference, 2006, pp. 1–9.

Page 19: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 19

A Case Study: IBM eServer z990 System z990 configuration

1. z990 contains 4 pluggable nodes connected through a planar board.

2. Each node contains up to 64 GB physical memory and 32 MB L2 cache for a system capacity of 256 GB memory and 126 MB L2 cache.

Error tolerance techniques used:1. Extensive use of ECC and parity with retry on data

and controls;

2. Full SRAM ECC and parity protection

3. Microprocessor mirroring

Page 20: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 20

Conclusion SER in logic and memory chips will

continue to increase as devices become more sensitive to soft errors at sea level

Open soft error issues:1. How EDA tools handle soft error hardening?2. Analysis of radiation mechanisms (too complex

to be comprehensive)3. Soft error rate analysis for logics4. Error mitigation methods

Page 21: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 21

Useful References and Further Readings

1. “Single Event Phenomena”, (Messenger and Ash, 1993)2. “Ionizing Radiation Effects in MOS Devices and Circuits”, (Ma and

Dressendorfer, 1989)3. “Handbook of Radiation Effects”, (A. Holmes-Siedle and L. Adams,1993)4. “Fault-Tolerance Techniques for SRAM-Based FPGAs”, (Kastensmidt,

Fernanda Lima, Carro, Luigi, Reis, Ricardo, 2006)

5. Test methods and standard: JEDEC89, JEDEC89A, JEDEC89-26. Journals: IEEE Trans on Nuclear Science, IEEE Trans Reliability7. NASA Goddard’s test group: http://radhome.gsfc.nasa.gov/radhome/papers/seeca5.htm

7. NASA Space Environment and Effects Program http://see.msfc.nasa.gov/… …

Page 22: January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering

January 4-8, 2008 VLSI Design 2008 22

Thank You . . .