design for testability dft seminar

Test Engineering

Courtesy ofPatrick D.T. O’Connor

62 Whitney DriveStevenage

Herts. SG1 4BJ UK

www.pat-oconnor.co.ukwww.pat-oconnor.co.uk/testengineering/htm

pat@pat-oconnor.co.ukpdtoconnor@ieee.org

Test Engineering

Outline (day 1):1. Introduction2. Stress, strength, failure of materials3. Stress, strength, failure of electronics4. Variation and reliability5. Design analysis6. Development test principles

Test Engineering

Outline (day 2):7. Materials and systems test8. Electronics test9. Software10. Manufacturing test11. Testing in service12. Data collection and analysis13. Laws, regulations, standards14. Managing test

Test Engineering

Why test?

• Design uncertainty• Manufacturing• Variation• Maintenance• Regulations• Contracts

Test Engineering

Causes of failure• Design inherently incapable• Variation (parameters, environments)• Wearout• Other time-dependent mechanisms• Sneaks• Errors

We must know them all!

Test Engineering

How to test?

• Test to succeed/test to fail?• Accelerated test• Systems and components• Technologies• Processes• Analysis and simulation

Test Engineering

Testing tales:• “Our engineers are paid to design right”• “Trains don’t need testing”• Ship engine for a locomotive?• We always have done this test• The telecomms system• MIL-STD-883 IC burn-in test• “Don’t overstress”• Too much test?

Test Engineering

Development test principles

•Failure costs exceed costs of test to detect & remove (Deming).

•Failure-free design: selection, training, teams, leadership•Optimise test programme

•Test adds value!

Test Engineering

Development test costs

• Test articles (“UUT”)• People X time• Facilities• Delay to market• Downstream opportunities (warranty, fixes, reputation, etc.)

Test Engineering

Management aspects:

• Design capability/risks• Markets, competition• Product environment, life• Suppliers• Regulations• Manufacturing, service

FAILURE CAUSES: MECHANICAL

• Maximum stress, fracture• Stress cycling, fatigue, creep

(vibration, temperature cycle)• Wear• Corrosion• Manufacture• Variation • Other (leaks, backlash, friction, ...)

MATERIAL STRESS, STRENGTH, FAILURE

Properties:• Strength/elasticity (Hooke’s Law)

– Stress (σ) = Young’s Modulus (E) X strain (ε)• Yield strength, ultimate tensile strength

(UTS)• Toughness/brittleness (resistance to

fracture: energy/volume)• Crack growth (Griffith’s Law)

MATERIAL STRESS, STRENGTH, FAILURE

Hooke’s Law:Stress

Strain ε

Elastic

Plastic

Yield point

Fracture

Figure 2.1 Material behaviour in tensile stress

MATERIAL STRESS, STRENGTH, FAILUREStress

10 20 30Strain ε %

Figure 2.2 Tensile stress/strain behaviour of different materials (generalised)

Brittle:cast ironceramicsglass

Ductile:plasticscoppersolder

Tough:kevlarsteelsalloys (Al,Ti, etc.)

FINITE ELEMENT ANALYSIS (MECHANICAL STRESS) (MSC)

MECHANICAL FAILURE CAUSES

• Shock overload

Constant failure/hazard rate (CFR/CHR)(Load - Strength Analysis)

• Strength deterioration

Increasing failure/hazard rate (IFR/IHR)Durability

CAUSES OF STRENGTH DETERIORATION

• Fatigue (cyclic stress: vibration, handling, temperature cycling)

• Creep (high temperature + mech. stress)

• Wear (parts moving in contact: connectors)

• Corrosion (electrolytic, contamination, ...)

• etc.

Stress

Cycles to failure N(log scale)

1 10 100 1000 10000 100000

FATIGUE: S - N CURVE

Fatigue limit

FATIGUE: MINER’S RULE

M1 +M2 + … Mk = 1

n1 n2 nk

“CLASSIC” FATIGUE FAILURE

Initiating crackor damage

Crack growth rings

Granular fracture surface

DESIGN AGAINST FATIGUE

• Reduce mech. stress concentrations (FEA)• Provide support for heavy components,

connectors, etc.• Minimise thermal gradients• Know material fatigue properties

particularly solder!• Design for safe life• Design for fail-safe• Design for inspection & test

VIBRATION

Leads to:

• Fatigue• Wear• Loosening • Leaks• Noise

VIBRATION

Measures:• Frequency (Hz)

• Displacement (m)

• Velocity (m/s)

• Acceleration (peak) (m/s2 or gn)

• Damping (reduces amplitude)

• Noise, vibration and harshness (NVH)

VIBRATION: WATERFALL PLOT

Figure 2.5 Waterfall plot of vibration data

TEMPERATURE EFFECTS

• Expansion/contraction (TCE)• Softening, weakening, melting

(metals, some plastics)• Charring (plastics, organics)• Drying/condensation/freezing• Other physical/chemical (Arrhenius’

Law)• Viscosity change, lubricant loss• Interactions (corrosion, …)

WEAR MECHANISMS

• Adhesive• Fretting• Abrasive• Cavitation/Erosion• Corrosive

WEAR REDUCTION

• Examine• Test/analyse• Lubricate (oils, MoS2-----)• Surface treatment (PTFE, …)• Stress reduction (mech, temp,

vibration)• Material change (eg. non-

abrasive)

CORROSION

• Ferrous Alloys (Rust)

• Non - Ferrous:- Al, Mg

• Chemical

• Electrolytic

PREVENTING CORROSION

• Material selection• Surface protection

- Anodising- Plating (Cr, Sn, ----)- Painting- Lubricating

• Environmental protection (seals, desiccants)

OTHER MECHANICAL FAILURE MECHANISMS

• Backlash (wear?)• Adjustments• Leaks• Loosening (fasteners)

- Wear?- Maintenance?

• etc.

MATERIAL SELECTION FOR RELIABILITY/DURABILITY

• Metals:- CorrosionProtectionFatigue

• Plastics, Rubbers:- ChemicalTemperature stabilityUV sensitivity

• Ceramics:- Fracture toughness• Composites:- Impact strength

DelaminationErosion

Electrical/electronicsStress, Strength & Failure

• Component selection• Stress derating (electrical, thermal)• EMI, EMC, ESD• Parameter variation• Connectors• Mechanical

Stress Effects

• Current– temperature rise– drift

• Voltage– current/overstress (EOS)– arcing, corona discharge

• Power (W=I2R)• Temperature

Arrhenius’ Law

⎥⎥⎥⎥

⎢⎢⎢⎢

⎡−=kTEKexpλ

or K ATλ= −⎡

⎢⎢⎢⎢

⎥⎥⎥⎥

E = activation energy (0.3 - 1.5 eV)k = Boltzmann’s constant (8.63 x 10-5 eVK-1)

T deg. CRated (85/125)

MIL217,Bellcore

Reality

20 200?

Temperature Effect on Reliability

Drift CharacteristicsCarbon Resistor +70C

50% PSR

100% PSR

Changein R%

Time hX1000

1.0 1.5 2.0

Semiconductor Device Construction Features• Si preparation• Diffusion• Passivation*• Metallization*• Glassivation• Connection• Packaging

(*multilayer)

Semiconductor Device Technologies

• ASIC• Mixed signal (analog/digital/RF)• 3-5 (GaAs, InP)• Power (transistors, thyristors, GTO,

IGBT)• Microwave (MMIC)

Microcircuit Mounting and Connection

• DIP in PTH• Flat pack / SOIC• Surface mounting

− Leadless chip carrier (LCC)− Pin grid array (PGA)/ball grid array

(BGA)− Chip scale packaging (CSP)− Tape automated bonding (TAB)

• IC sockets (DIP, LCC)

Semiconductor Device Failure Mechanisms

1. Die Related• Crystal structure / impurity• Diffusion / masking• Passivation / dielectric breakdown (TDDB)• Electromigration• Passivation• Latch-up• Slow trapping, hot carriers, alpha particle• External: ESD / EOS / EMP

Semiconductor Device Failure Mechanisms

2. Package Related• Adhesion• Bonding• Impurity / corrosion / inclusions• Hermeticity• Solderability

Passive Device Failure Mechanisms

1. Resistors (Fixed)• Parameter drift• Open circuit• Noise

2. Variables• As above plus:• Mechanical failure• Contact failure• Seal failure

3. Capacitors

• Short circuit (dielectric breakdown)• Open circuit (high V)• Leakage (wet types)• Wire bond failure (open circuit)

4. Interconnections• PCB

- ball bonds- track cracks (opens)- through hole opens- shorts

• Wire/ribbon− breaks (fatigue, damage)− solder attach

• Intermittents

Solder

Major contributor to failures!(SMT, BGA, >10K joints/board)• Inadequate wetting (contamination,

oxidation)• Insufficient time (“second drop”)• Fatigue• Creep

Insulation

• Damaged, cut, chafed, trapped, …

• Overheated

• Aged, embrittled

• Eaten (rodents)

System/circuit Problems

• Distortion• Jitter• Timing• Interference/compatibility (“noise”)

(EMI/EMC)• Intermittents/no fault found (NFF)

EMI: Problems

• High frequencies (MHz - GHz) (VHF-UHF!)• Close spacing (SMT, narrow tracks)• ASICs, mixed signals (digital, RF)• New regulations (UL, CE, etc.)• Lack of knowledge (designers, managers)• Basic EDA does not simulate

EMI Sources (internal)

• Current loops (Lenz’s Law: reduce loop area)

• Signal noise (components, conductors)

• Ground noise

EMI Sources (external)

• ESD• Switched inductive loads• Supply transients• Other systems (motors, radars,

computers, peripherals)

EMI Protection

• Shielding− Faraday Shield− Coax cables

• Circuit protection− Capacitive (decoupling)− Inductive− Opto-couplers− Filters, regulators (on PCB)

Electrical Overstress/Electrostatic Damage

EOS/ESD

• ICs ARE VULNERABLE!!• People generate 1 - 5 kV / 50 - 100 μJ• EOS / ESD can kill ICs• It can also do GBH• On-chip protection

EOS/ESD Protection

• Connector separation for different voltage levels

• Decoupling of ICs• Isolation (opto-couplers)• Handling / packaging / bonding• On-chip protection

Probability Distributions

Histogram and Probability Density Function

pdff(x)

55X standard deviation sMean

Probability

Normal Distribution

Variable-4 -3 -2 -1 1 2 3 4

”Natural” Variation

• Constant in time. Past = Future

• ”Normal” Distribution Function(Mean, Standard Deviation)

• ”Made by God”

Normal (Gaussian) Distribution

• Central Limit Theorem• Symmetrical about mean/median μ• Standard deviation (SD) σ . Variance = σ2

in ±nσ : 1 2 3 6lie: 68% 95% 99.7% 99.999999%

Variation in Engineering

• Not ”normal”• Not constant in time. Past NOT = Future• Selection effects• Often deterministic (V = IR, F = ma)• Sometimes due to failures, errors,....• Occasionally catastrophic

(discontinuous, eg. fatigue)• ”Made by man”

59X standard deviation s1 2 3-3 -2 -1 4-4Mean

Probability

Curtailed Distribution

Variable

-10% -5% Nom. +5% +10% Parameter

Probability

Effect of Selection

Skewed Distribution

Probability

Variable

Bimodal Distribution (typical human mortality)

Probability of death at this age

Variable (years)10 20 30 40 50 60 70 80 90 100 110

-nσ nσMean

Four distributions with same mean and SD (from Shewhart)

Normal Distributions?

Weibull Distribution

R = exp[-(t/μ) ]

μ = Characteristic lifeβ = Shape parameter (slope)

= 1 : CHR< 1 : DHR> 1 : IHR

If failure-free life = γ, replace t with (t - γ)

Load Strength

Probability

ValueL S

a. Non-overlapping distributions

b. Overlapping distributions:wide strength variation (low LR)

L Sd. Overlapping distributions:wide load distribution (high LR)

Distributed load and strength

c. Curtailed strength distribution

Distributed Load & StrengthFor Normally Distributed Load L and Strength S

σ +σ

Time/load cyclesLog scale

Strength

Time-dependent load and strength

Specification

Probability

Strength

Figure 6.3 Strength vs. Specification (time-dependent)

Probability of failingat max. specifiedstress

Strength v. specification(time dependent)

Summary of High Reliability Design Principles

• Determine most likely distributions of load and strength

• Evaluate SM for intrinsic reliability• Determine protection methods (load limit,

derate, screen, QC)• Analyse strength degradation modes• Test to corroborate, analyse results• Correct or control (redesign, safe life,

maintenance,...)

Multiple Variations

Traditional Method:

• Test effect of one variable at a time

• Cannot test interactions

Statistical Design of ExperimentsDoE

• Test all variables simultaneously• Randomisation• Analysis of variance (ANOVA):

1. Determines effects of all variables2. Determines effects of all interactions

(R.A.Fisher, 1926)

Genichi Taguchi

• ”Loss to Society”• System Design• Parameter Design• Tolerance Design• Control & Noise Factors• Orthogonal Arrays• Brainstorm

DoE: Engineering Aspects

• Statistical v. engineering significance• Randomisation• Cost effectiveness• Confirmation• SPC• CAE• Nonlinearity• Management

Confidence and Risk

• s-confidence = probability that population parameter lies between “confidence limits”

• Bigger sample, narrower confidence limits• Risk = (1 - confidence) (probability that

parameter lies outside confidence limits)• s - confidence vs. engineering confidence

Statistical, Scientific and Engineering Confidence

• Statistical test (binomial):items tested, 0 failures 0 1 10 2080% s-confidence that R > 0 0.90 0.98 0.99

Data is entirely statistical, no prior knowledge

• Scientific test:items dropped, all fall 0 1 10 20confidence that all will fall 1 1 1 1

Information is deterministic

• Engineering: can range from deterministic to statistical

Measures of Reliability

• Failure Rate (FR) (λ)

• Hazard Rate (HR for non-repairable items) (λ)

• Mean Time Between Failures (MTBF) (M)*

• Mean Time to Failure (MTTF) (M)*

• Durability (failure free life; FR = 0)

• Reliability R = Probability of no failures in time t

= e-λt = e-t/M *

*(for constant failure/hazard rate)

Patterns of Failure

The Bathtub Curve

CFRDFR(weak)

IFR(wearout)

Infant mortality Useful life Wearout

Variation: summary

• Variation is seldom (never?) “normal”• Most important variation is in the tails

– Less data– More uncertain– Conventional stats most misleading

• Variation can change over time• Interaction effects• Variation made by people• Most engineering education maths only

Development Test Principles

Categories of test:

• Functional (design proving/proof of principle)• Reliability/durability• Contractual/safety/regulatory• Test and evaluation (T&E)• Beta testing

Fill ”uncertainty gap”

• Performance/safety: – demonstrate success– perform once

• Reliability/durability: – test to fail– accelerated tests

• Variation: – Taguchi/statistical experiments– Multiple tests?

• Components, systems, interfaces• Software• External suppliers• FRACAS• Integrated test programme

Test economics: major driver of development cost & time, BUT:• Failure costs increase during project phases (x10 rule: design, development, production, service)• Failure free design is cheaper!(experience, training, integrated engineering, design analysis)

Probability

Specification Strength (stress to fail)L

Strength v. Specification

Probability

Specification Strength (stress to fail)

Strength v. Specification(transient & permanent failures)

Transient Permanent

Probability

Specification Strength (stress to fail)

Strength v. Specification (time dependent)

• Failures are often due to combined stresses/strengths (uncertain)• Failures are often influenced by interactions (uncertain)• Failures often time-dependent (uncertain)• Causes of service failures can be shown by different test stresses, e.g.

– vibration/temperature cycle– high frequency/low frequency

Fundamental principle: increase (combined) stresses to cause failures, then use information to make product strongerLimits:• Technology (e.g. solder melt)• Test capability• Economic

Testing at “representative” stresses, and hoping for no failures, is ineffective and a waste of resourcesExamples:

• Engines on test beds

• Cars on test tracks

• “Simulated” environmental test (MIL-STD-781, MIL-STD-810, etc.)

Environments (1):

• All relevant environments• Combined environments (CERT)• User• Environmental simulation?

Environments (2):

• Thermal• Thermal fatigue (switching)• Vibration• Shock• Humidity• Power supply/load• Transients (ESD, EOS)• Pollution, corrosion• People, other animals• Etc.

Accelerated stress test

• Miner’s Law for fatigue (mech, thermal)

• Arrhenius Law for thermal acceleration?

• Step-stress testing

• Failure modes relevant, not stress levels!

Highly accelerated life test (HALT) (1)

• Highly accelerated combined stresses (temperature, cycling, multi-axis vibration, others...)• Step stress to discover transient and permanent limits• Time compression: orders of magnitude• Developed by Gregg Hobbs

Development Test PrinciplesHALT (2)

• Special chambers, facilities (QualMark, Thermotron, Screening Systems, TEAM, ...)• Savings: time, space, energy• Optimise manufacturing screens (HASS)• Similar approaches:

– Highly accelerated stress test (HAST)– Stress-induced failure test (STRIFE)– Failure mode verification test (FMVT ®Entela)– Etc.

HALT Philosophy (1)

Stress limits

Stress(combined)

Product spec.

Upperoperating

Loweroperating

Upperdestruct

Lowerdestruct

• High stresses = small samples!

HASS Philosophy

Stress(combined)

Product spec.

Upperoperating

Loweroperating

Upperdestruct

Lowerdestruct

Precipitation screen

Detection screen

HALT/HASS Philosophy (2)S

Cycles to fail(Log N)

HALT/HASS

in use

Accelerated Test Approach

TE p105

1. What failures might occur in service? (FMEA, etc).2. List/analyse stresses, combinations.3. Plan how to apply.4. Apply single stresses, step increases to failure.5. Analyse failure, strengthen design.6. Iterate 4 & 5 to fundamental limits.7. Repeat with combined stresses.8. Iterate 5 & 6.

Examples:

• Mechanical (rotating, engines, etc.)– Old lubricants, filters– Low fluid levels (oil, coolant)– Out-of-balance

• Electro-mech (printers, etc.)– Temp, vib, power V level, humidity, ...– Misalign shafts, etc.– Out-of-spec. materials (paper, friction, ...)

• Electronic components/packages, etc.– Temp, vib (high frequencies), etc.– Use vibration transducers (speaker coils?)

Questions (TE p109):• How many to test? As many as practicable /economic• Can reliability (MTBF, durability) be measured? NO! It will be increased!• How do we know if failure on test could occur in service?Analyse, use experience, THINK!• Product will see no vibration in service. Why vibrate on test?Vibration on test can stimulate failures caused by temp. cycle, handling, etc. in service, QUICKLY!• Is the principle limited to temp, vib, elec stress?Not at all. Apply to fluid systems, mech tolerances, etc.

HALT/HASS Payoffs

• Robust designs + capable processes = High Reliability

• Reduced test time and cost• Feedback to design: reduce

“uncertainty gap” on future products• Continuous improvement (“kaizen”)

of design capability (products, processes)

Accelerated Test or DoE?

Important Variables, Effects, etc. DoE/HALT?

Parameters: electrical, dimensions, etc. DoEEffects on measured performance parameters, yields DoEStress: temperature, vibration, etc. HALTEffects on reliability/durability HALTSeveral uncertain variables DoENot enough items available for DoE HALTNot enough time available for DoE HALT

Circuit Test Principles: Analog

• DC: current, potential, resistance (AVO), capacitance, ...• AC: current, potential, impedance, waveforms, ...• Signals: waveforms, gain, distortion, jitter, ...

O (output)

Truth table:A B O

0 0 00 1 01 0 01 1 1

Truth table for 2-input AND gate

inputs

Circuit Test Principles: Digital

Test vectors: 4

(combinational logic)

“Stuck at” faults (SA0, SA1)

Logic classes:

• Combinational: outputs follow inputs

• Sequential: input dependent, also data flow,

memory allocation

• Dynamic: requires refresh/”keep alive”

Fault types:

• SA0, SA1• Stuck at input• “At speed”• Pattern sensitive• Etc.

Manual Test Equipment

• Basic instruments – DMMs, power meters, ...

• Instruments – oscilloscopes, waveform generators, spectrum analysers, logic analysers, ...

• Special instruments– RF testers, optical signal testers, hi volt, ...

• PC - based

Automatic Test Equipment (ATE)

• Vision: Automatic optical inspection (AOI), X-ray (AXI)• Manufacturing defects analyser (MDA)• In-circuit test (ICT) • Fixtureless/flying probe• Functional test (FT) (via circuit connectors)• Combined ICT/FT• Special test (RF, power supplies, manual, “hot rig”..)

Test Capability

ATE must:• Confirm correct operation of good circuits• Not classify good as faulty• Detect faulty items• Diagnose fault causes

Design for Test (DFT)

Design must allow ATE to:

• Initialize (start clocks, set logic states)• Control (e.g. open feedback loops, force

logic, generate inputs)• Observe (access to important nodes)• Partition (reduce test program complexity)

Layout for ICT

• Keep PCB edges clear• Location holes• Large components on top (for double

sided PCBs)• Resistors between power lines and

control signals (resets, enables, tristates)• Clock disable (provide link)

Built-in Test (BIT)

• Boundary scan (IEEE 1149.1)• ASICs• Logic and function tests• Complexity, false alarms

EMI/EMC TestMust test for:

• Radiated emissions• Conducted emissions (power lines, signal lines)• Compatibility (susceptibility) (radiated, power, signals)• Internal problems• Special situations (rail signalling, avionics, lightning, nuclear (NEMP, etc.)

Standards and regulations

Test Control and Data Acquisition (DAQ)

Test databus standards:• General purpose interface bus (GPIB)

(IEEE488)• PC interface bus (PCI), PCI extensions for

instruments (PXI)• VLSI extensions for instruments (VXI)

IC Test

• Special/expensive ATE

• Test cost ≅ IC manufacture cost!

• IDDQ test

• BIST

• Standard tests (MIL-STD-883, etc.)

• Rely on IC manufacturer’s tests

Node state

Etc.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Good device

Defective device (atstates 2,3,10, ...)

Figure 8.11 IDDQ plot

IDDQ Test

Standards, References, Software

• MIL-STD-2165 (USA)• DEF STAN 00-13 (UK)• ‘Design for Testability’ - Jon Turino• ‘Testability Advisor’ - Logical Solutions

Software Reliability• All new systems involved

(operating & test)• Cannot predict failure modes and

effects• Cannot test complete system*• Errors are present in all copies*• S/W - H/W interfaces (keyboards,

sensors, devices, emi)

*Compare VLSI hardware

Hardware/Software Reliability Differences (1)1. Failures can be caused by

deficiencies in design, production, use and maintenance.

2. Failures can be due to wear or other energy-related phenomena.

3. No two items are identical. Failures can be caused by variation.

4. Repairs can be made to make equipment more reliable.

5. Reliability may be time-related, with failures occurring as a function of operating (or storage) time, cycles, etc.

6. Reliability may be related to environmental factors (temperature, vibration, humidity, etc.

7. Reliability can be predicted, in principle but mostly with large uncertainty, from knowledge of design, parts, usage, and environmental stress factors.

1. Failures are primarily due to design faults.

2. There are no wearout phenomena. Software failures occur without warning,

3. There is no variation: all copies of a program are identical.

4. There is no repair. The only solution is redesign (reprogramming

5. Reliability is not time related. Failures occur when a specific program step or path is executed or a specific input condition is encountered, which triggers a failure.

6. The external environment does not affect reliability except insofar as it might affect program inputs.

7. Reliability cannot be predicted from any physical bases, since it entirely depends on human factors in design.

Hardware/Software Reliability Differences (2)

8. Reliability can be improved by redundancy. since if one path fails, the other will have the error.

9. Failures can occur in components of a system in a pattern that is, to some extent, predictable from the stresses on the components and other factors. Reliability critical lists are useful to identify high risk items.

10. Hardware interfaces are visual; one can see a 10-pin connector.

11. Computer-aided design systems exist that can be used to create and analyse designs.

12. Hardware products use standard components as basic building blocks.

8. Reliability cannot be improved by redundancy if the parallel paths are identical.

9. Failures are rarely predictable from analyses of separate statements. Errors are likely to exist randomly throughout the program, and any statement may be in error. Reliability critical lists are not appropriate.

10. Software interfaces are conceptual rather than visual.

11. There are no computerised methods for software design and analysis.

12. There are no standard parts in software, although there are standardised logic structures. Software reuse is being deployed, but on a limited basis.

Software in Engineering• “Real time”• Wide range of interfaces (hardware,

human, timing, ...)• Different levels of embedding

(ASICs, PGAs, BIOS, ...)• Hardware/software options for

functions• Electrically “noisy” environments• Usually smaller

Software ReliabilityERROR

Sources of error:• Specification (60%)• Design (20%)• Code(20%) (typo, numerical,

omissions, etc.)• Timing/emi• Data (information) integrity

FAULTFAILURE

Error Reduction

• Modular design• Error traps• Remarks• Spec & code review• Test

Fault Tolerance• Internal tests (rates of change,

cycle times, logic)• Resets, fault indications• Redundancy, voting• Hardware failure protection

Languages• Machine code/microcode• Assembly level/symbolic assemblers

– Both processor specific– Faster, less memory– Difficult, error prone

• High level (HLL) (BASIC, Fortran, *Pascal, *Ada, *C, *C++)– Processor independent– Easier, error protection*– Assemblers, compilers

• Programmable logic controllers (PLCs)• Assemblers, compilers

Software Testing (1)

• Total paths = 2n (n = branches + loops)• Test specs

– All requirements (“must do”, “must not do”)– Extreme conditions (timing, parameter

values, rates of change, memory utilisation, ...)

– Input sequences– Fault tolerance/error recovery

Software Testing (2)• Module & interface tests (“white box”)

– Data /control flow– Memory allocation– Lookups– Etc.

• System tests– Verification– Validation (“black box”)

Documentation• Specifications• Code, remarks• Notebooks• Changes, corrections• Test results:

– Version– Test– Faults

Software Reliability Prediction and Measurement

• Methods:– Error/bug count– Time-based (hours, days, CPU

seconds)

• “Cleanroom” approach (IBM)

• Do not use!

Test in Manufacture

Manufactured items are either:

1. Good

2. Defective, but detected and fixed or scrapped

3. Defective, but shipped, and might/will fail later

We must inspect/test to discriminate

Manufacturing Test Principles (1)

• All testing costs. So minimise (ideal = zero)• But:

– Manufacturing processes generate variation & defects– Later costs of variation & defects can exceed costs of detection & correction/removal

• So:

– Must consider total life cycle (manufacturing, use, ...)

Value-added testing

Manufacturing Test Principles (2)

Test cost justification is difficult, because:

• Test costs arise in manufacture; failure costs arise later

•Failure occurrences and costs cannot be predicted

Some testing might be obligatory: calibration, EMI/EMC, safety, etc.

Test Capability

Tests must:• Identify good items • Detect defects (parts, processes,

suppliers, ...)• Indicate defect source/location

Test OK?

YPass?

Detect?Y Diagnose,

repair

Next test

Figure 10.6 Test pass-fail logic

Test Pass - Fail Logic

Test Criteria and Stresses

• Manufacturing tests are not tests of the design

• Manufacturing tests must not damage good items (contrast with development)

Manufacturing Test Economics

Aspects to consider:

• Cost of test(s) (setup, run, repairs, ...)• Defects that might be generated upstream• Test capability• Alternatives to test (inspection, ...)• Methods to reduce/prevent defects• Downstream costs of undetected defects• 100% or sample test?

Manufacturing Test Economics

Examples:• Screw• Integrated circuit• Automotive gearbox• Car• Spacecraft• Electronics assembly

Inspection and Measurement

Inspection:• Visual (manual, automatic)Measurement:• Dimensional (metrology)

– Micrometers, CMMs, ...• Parameters

– mech. (strength, torque, ...)– elec. (instruments, ATE, ...) (Module 8)

Inspection, measurement, test: not absolute definitions

Stress Screening

Definition: application of stresses to cause defective items to fail/show without damaging good ones

Alternative terms:• Environmental stress screening (ESS)• Burn-in (electronic components & systems)• STRIFE test• etc.

Guidelines, etc:• US NAVMAT P-9492• US MIL-STD-2164• IEST ESSEH Guidelines

Highly Accelerated Stress Screening (HASS)

• Highly accelerated stresses (temp., vib., elec., ...)

• Developed via HALT in development testing• Stresses are not extrapolations of service

conditions• Can be applied only to products that have

been subjected to HALT in development

HASS Philosophy (1)

Stress(combined)

Product spec.

Upperoperating

Loweroperating

Upperdestruct

Lowerdestruct

Precipitation screen

Detection screen

HALT/HASS Philosophy (2)S

Cycles to fail(Log N)

HALT/HASS

in use

HASS Philosophy (3)

• Proof (safety) of screen (POS)

• HASA (audit): sample v. 100%

• Review/adapt (e.g. repeat POS)

• Can apply to any technology (elec., mech.)

• Keep flexible (no standard procedures)

Electronics Manufacturing Faults

In rough order:• Solder problems (permanent/intermittent o/c

or s/c, weak, ...)• Parts missing/wrong place/wrong value• Part parameters/functions• Damage (physical, ESD, ...)• System/assembly level (cables/connectors,

variation, EMI/EMC, ...)In 1970’s list could have been reversed!

Assemble AOI

faildi

pass MDA

faildm

pass ICT/FT

faildf

passShip

Diagnose/repair

CΙCM CF

Figure 10.3 Electronics assembly t est flow example

C = costd = proportion failed

Electronics Test Options/EconomicsBoard test:

Electronics Test Options/EconomicsA simple model for the manufacturing and test cost per unit is:

C = CA + CI + CM + CF + (CR + CM + CF ) (dI + dm + df )

If, for example,

CA = $200CI = $10CM = $10CF = $20CR = $50dI = dm = df = 0.05

then the total cost per unit would be $252

Fault Proportions & CoverageCoverage %

Fault faults % AOI AXI MDA/ ICT FT HASS

Open circuit 25 40 95 85 95 *Insufficient solder 18 40 80 0 0 20-80Short circuit 13 60 99 99 95 *Component missing 12 90 99 85 85 *Component misaligned 8 80 80 50 0 0Component elec. para error 8 0 0 20/80 80 *Wrong component 5 15 10 80 90 *Other non-electrical 4 80 0 0 0 20-80Excess solder 3 90 90 0 0 0Component reversed 2 90 90 80 90 *

Assembly Test

Board 1

Board 2

Backplane

Keypad

Display

Test Test

Electronic Assembly Burn-In (ESS)

• Typically -30ºC to 70ºC, 5 cycles• Power on (monitor)• (Vibrate)• Finds production defects

– Solder– Damage

• Not effective against component defects (low temp, low stress)

Integrating Stress Screening

• Integrate with functional test (FT)• Before/after AOI/ICT?• Assembly stages?:

– Board– Intermediate– Final

• Re-screen after repair? YESNo fixed rules!

Post-Production Economics

• TE Page 183

Electronic Component Test

• All components tested by manufacturers

• Generally not practicable/economic for OEMs/CEMs to test (IC tester $5M!)

• No repair possible• Special cases:

– Power devices?– Etc?

Infant mortality

“Freaks”

Good population(zero failures)

Failureprobability

Time (h)10 100 1000 10000

Electronic Component Population Categories

IC Test

• MIL - STD - 883 (TE p. 186)– Level A, B, C screens– Burn-in (125°C, 168h)– Plastic/hermetic packages (autoclave test)

• Other standards (CECC, IEC, ...)

Don’t use!

In-Service Test Philosophy

Test only:

• If only way to determine correct function• To determine failure cause (diagnostic)• To confirm repair

Optimise during development

Test Schedules• Continuous (BIT, monitors, ...)• Time run (electronics, aircraft, engines, ...)• Distance travelled (cars, trains, ...)• Operating cycles (electronics, aircraft

engines, ...)• Calendar (calibration, seasonal, ...)

Must be measuredIntervals, tolerances

Examples

• TE pages 191-193

Built-in (Self) Test (BIT/BIST)

• Apply only to functions that are not observed

• Keep it simple!– Sensors etc. fail– False alarms

• Implement in software (no weight, power, complexity)

“No Fault Found” (NFF)Causes:• Intermittent failures (components, connections, ...)• Tolerance effects• Connectors• BIT false alarms• Incorrect diagnosis/repair• Inconsistent test criteria• People• Ambiguous cause: >1 suspect unit changed

(Also “retest OK” (RTOK), etc.)

50% - 80% of repairs!

RCM Objectives• Optimises preventive maintenance (PM)

• Balances cost, availability, reliability, safety

Maintenance Categories (1)

Corrective (CM):• Failure repair• Unplanned• Expensive/unsafe

Minimise by high reliability and durability, + effective PM

Maintenance Categories (2)

Preventive (PM):• Failure Prevention• Planned• Less Expensive/Safe

Optimise by RCM

RCM Decision Logic (1)

Failure Pattern:• Increasing (wearout)? Consider

replacement– Failure-free life (light bulbs/tubes, drive belts,

bearings, ...)

• Decreasing/constant? No replacement(electronics, ...)

RCM Replacement Intervals (1)

m 2m 3m

Decreasing hazard rate:scheduled replacementincreases failureprobability

m 2m 3m

Constant hazard rate:scheduled replacementhas no effect on failureprobability

RCM Replacement Intervals (2)

m 2m 3m

Increasing hazard rate:scheduled replacementreduces failureprobability

m 2m 3m

Increasing hazard rate:with failure-free life >m:scheduled replacementmakes failure probability = 0

Failure Effect (FMECA):

• Critical? Consider replacement / PM

• Detectable? Consider PM (eg. fatigue)

RCM Decision Logic (3)Failure Cost:

• High? Consider replacement(gearboxes, engines, ...)

• Low? Consider replacement on failure(light bulbs/tubes, hydraulic hoses (?), ...)

ScheduledReplacement

FR Increasing?

FECritical?

FailureDetectable?

FailureCost High?

NoReplacement

Replace OnFailure

(Incipient) Failure Detection Methods

Mechanical:• Manual (corrosion, wear, condition, ...)• NDT for fatigue (ultrasonic, dye penetrant,

radiographic, ...)• Oil analysis (spectroscopic, magnetic)• Vibration/acoustic

Electrical/Electronic:• Built-in test• Functional test/calibration

Stress Screens for Repairs

• Proves repair effectiveness• Reduces NFF• Use HASS if units subjected to

HALT/HASS

Calibration

• Regular test to ensure accuracy– Measuring devices– Instruments– Sensors

• Traceability• Accuracy (ISO5725)• Management, records, labels

Organisation and Responsibilities

Test Department:• Provide facilities (strategic, tactical)• Knowledge (methods, requirements,

regulations, standards, ...)• External facilities (contracts, hire, ...)• Maintenance and calibration• Training

Projects:

• Create and manage team

• Plan and manage testing

• Liaison with Test Department

• Identify/obtain project-specific requirements

Design:• Design product• Design processes (manufacture, test,

maintenance)• Integrate design analysis & development

test• Design review (specification, pre-test, pre-

production)

Test Procedures

Include:• Organisation and responsibilities• Methods (design analysis, test)• Test planning and action• Failure reporting (FRACAS)• Project/design reviews• Integration (development, production,

maintenance test)• Test equipment maintenance & calibration• In-service maintenance & calibration

Development Test Programme

What/when to test?• Components, modules, system• Component test:

– earlier– more/cheaper– higher stresses– selection

• External suppliers’ products• Output module(s) first

Development Test Programme

How many to test?• As many as practicable

(components/modules/systems)• Consider design analyses, risks, time,

costs• Rotate items through tests (e.g. Software,

proving, environmental, ...)Ever heard of too much testing?

Testing Purchased Items

Base testing on:• Project requirements• Existing knowledge

– supplier’s data– past use

• Application/risks/novelty/costs ...• Supplier’s test programme/results

Integrate!RetainRepeat

In-House v. External Facilities

In-house:• Core technologies

/confidentiality• Designers more

involved• More flexible (?)• Cheaper (?)

External:• Lower capital outlay (?)

• Better facilities /expertise (?)

Consider balanced use of bothTE homepage (/testservices.htm)

Project Test Plan (1)

Include:• Requirements (performance, reliability,

standards, ...)• Failures that must/should not occur• Design/design analysis inputs (design review)• Tests to be performed• Test items/allocations• Suppliers’ test requirements• Integration through project phases• Responsibilities (primary, support)• Schedules

Project Test Plan (2)

• Single test plan• Link to other project plans

– reliability– safety– quality, ...

• Link/refer to procedures, standards, ...Flowchart: TE Fig. 14.1 (p. 241)

Example: Appendix 3

Manufacturing Test Plan

• Develop from development test results• HALT/HASS

Flowchart: TE Fig. 14.2 (p. 242) Example: Appendix 4

Management Issues

• Training– degree courses– short courses– on-the-job (HALT/HASS)

• Integration– across functions– through phases

• Economics– Long v. short term– Test adds value

The Practice of Engineering Management, P.D.T. O’Connor (Wiley)

The Future of Test

• Virtual test– EDA, FEA, CFD, ...– Simulation– Virtual reality

• “Intelligent” CAE– Integrated physics, variation, ergonomics, ...– automatic design

• Internet• Test hardware (BIT, “Sentient™”, ...)• Computer-based test• Teaching (?)

design for testability dft seminar

Technology

digital vlsi testing week 1 assignment solution...week 2...

design for...

refactoring & testability

design for testability - indian institute of technology...

cmos testing-2 - duke electrical and computer...

design for testability (dft) to overcome functional board...

intel® ethernet controller i210 datasheet - mouser.com ·...

design for testability 1 -...

chapter 02 dft slides 091806 - university of british...

code testability

software testing techniques software testing techniques...

testability transformation: program transformation to...

smta testability guidelines tp-101c promo · smta...

testability measure

testability consideration

lecture 23 design for testability (dft):...

1,, vlsi testing and dft,, course testability measure what...

testability: lecture 23 design for testability...

design for testability - indian institute of technology...

chapter 02 dft slides...