accelerated reliability qualification in automotive testing

6
QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL Qual. Reliab. Engng. Int. 2004; 20:115–120 (DOI: 10.1002/qre.619) Special Issue Accelerated Reliability Qualification in Automotive Testing Alex Porter ,† Entela, Inc., 3033 Madison Ave, SE, Grand Rapids, MI 49548, U.S.A. Products must come to market quickly, be more reliable and cost less. The problem is that statistical measures take time. There is a clear need for actionable information about the robustness or durability of a product early in the development process. In a Failure Mode Verification Test (FMVT), the analysis is not statistical but is designed to check two assumptions. First, that the design is capable of producing a viable product for the environments applied. Second, that a good design and fabrication of the product would last for a long period of time under all of the stresses that it is expected to see and would accumulate stress damage throughout the product in a uniform way. Testing a product in this way leads to three measures of the product’s durability: (1) design maturity, the ratio between time to first failure and the average time between failures after the first failure; (2) technological limit, the time under test at which fixing additional failures would not provide a significant improvement in the life of the product; and (3) failure mode histogram, which indicates the repeatability of failures in a product. Using techniques like FMVT can provide a means of breaking the tyranny of statistics over durability and reliability testing in a competitive business climate. Copyright c 2004 John Wiley & Sons, Ltd. KEY WORDS: accelerated testing; quantification; durability; failure; verification; statistics; life INTRODUCTION B usiness reality is that products must come to market quickly, be more reliable and cost less. The old phrase ‘Quick, Simple, Cheap; pick any two’ still applies, but the choice has been made—Quick and Cheap. When it comes to development testing and validation the need for robust design in short periods of time reduces the opportunity to use traditional long duration durability tests. Accelerated testing (sometimes as short as a couple of days) is becoming increasingly necessary. The problem is that compressing a test to a few days, and still providing meaningful information to the engineers and managers who use the information, is not simple. (Quick and Cheap, but not Simple.) The problem is that statistical measures take time, but time is no longer available. Since statistics is the only way to quantify reliability, it is the only way to convey meaningful, objective information between testing, Correspondence to: Alex Porter, Entela, Inc., 3033 Madison Ave, SE, Grand Rapids, MI 49548, U.S.A. E-mail: [email protected] Copyright c 2004 John Wiley & Sons, Ltd. Received 23 December 2002 Revised 11 April 2003

Upload: alex-porter

Post on 06-Jul-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL

Qual. Reliab. Engng. Int. 2004; 20:115–120 (DOI: 10.1002/qre.619)

Special Issue Accelerated ReliabilityQualification in AutomotiveTestingAlex Porter∗,†

Entela, Inc., 3033 Madison Ave, SE, Grand Rapids, MI 49548, U.S.A.

Products must come to market quickly, be more reliable and cost less. The problem isthat statistical measures take time. There is a clear need for actionable informationabout the robustness or durability of a product early in the development process. In aFailure Mode Verification Test (FMVT), the analysis is not statistical but is designedto check two assumptions. First, that the design is capable of producing a viableproduct for the environments applied. Second, that a good design and fabricationof the product would last for a long period of time under all of the stresses that itis expected to see and would accumulate stress damage throughout the product in auniform way. Testing a product in this way leads to three measures of the product’sdurability: (1) design maturity, the ratio between time to first failure and the averagetime between failures after the first failure; (2) technological limit, the time undertest at which fixing additional failures would not provide a significant improvementin the life of the product; and (3) failure mode histogram, which indicates therepeatability of failures in a product. Using techniques like FMVT can provide ameans of breaking the tyranny of statistics over durability and reliability testing in acompetitive business climate. Copyright c© 2004 John Wiley & Sons, Ltd.

KEY WORDS: accelerated testing; quantification; durability; failure; verification; statistics; life

INTRODUCTION

Business reality is that products must come to market quickly, be more reliable and cost less. The oldphrase ‘Quick, Simple, Cheap; pick any two’ still applies, but the choice has been made—Quick andCheap. When it comes to development testing and validation the need for robust design in short periods

of time reduces the opportunity to use traditional long duration durability tests. Accelerated testing (sometimesas short as a couple of days) is becoming increasingly necessary. The problem is that compressing a test to afew days, and still providing meaningful information to the engineers and managers who use the information, isnot simple. (Quick and Cheap, but not Simple.)

The problem is that statistical measures take time, but time is no longer available. Since statistics is theonly way to quantify reliability, it is the only way to convey meaningful, objective information between testing,

∗Correspondence to: Alex Porter, Entela, Inc., 3033 Madison Ave, SE, Grand Rapids, MI 49548, U.S.A.†E-mail: [email protected]

Copyright c© 2004 John Wiley & Sons, Ltd.Received 23 December 2002

Revised 11 April 2003

116 A. PORTER

design, manufacturing and management, right? Wrong. Statistics as a means of conveying meaningful, objectiveinformation from tests dictates several constraints on length of test and large numbers of samples that are notnecessary. There are ways of conveying meaningful, objective information other than statistics. Remember,statistics is only one branch of mathematics, and mathematics is only one form of communication.

The business need for clear, actionable information about the robustness or durability of a product is verystraightforward. The timeline for bringing a product to market has become so short that the time available toconduct testing is non-existent. At the writing of this paper, I have just finished a report for an automotive client.The product timeline to production was 3 months. The time available for testing between prototype availabilityand the design freeze for hard tooling was −5 days (that is negative 5 days). We did conduct testing that was ableto meaningfully impact the design and robustness without having to hold up tooling or re-work hard tooling.I will explain how later.

Accelerated test methods can be grouped into two categories: statistical based, and non-statistical. Statisticalaccelerated testing methods use assumptions and math models to reduce the testing time and/or sample size inorder to make a more efficient test.

Ford Motor Company uses a key life testing method that reduces the time for testing by identifying the ‘key’portions of the stress environment for its products and using computer modeling and specific comparative testingto determine the exposure necessary to equate to one life. This method reduces test time but still quantifies thereliability of the product as a function of numbers of parts meeting a certain number of lives.

Wayne Nelson has championed using accelerated reliability techniques to establish the stress-to-liferelationship of a part. This reduces testing time by establishing shorter time-to-failure at elevated stresses andextrapolating the expected time-to-failure for service conditions.

Both the key life and the accelerated reliability methods assume that the desired output is the statisticalmeasure of reliability.

Other accelerated techniques quantify meaningful information without statistics. They do this by focusing onthe users of the information and what they need to know. In the case of the automotive product mentioned earlierwith negative 5 days available for testing, the key piece of information was, how will it fail and which failuresare relevant?

To test this automotive product in a short period of time, obtain meaningful data, and not use statistics, weconducted an FMVT� (Failure Mode Verification Test) over a period of three days. During the test, we appliedall of the known stress sources to the product, starting at service conditions, and elevating the stress levels every2h toward a destruct limit. The goal of the test is to precipitate failure modes from all stress sources in anorder that approximates their relevance. By applying all of the stresses simultaneously and elevating them fromservice conditions towards a destruct limit, the failures can be shown to be precipitated in approximately theorder of relevance1.

The FMVT method differs from the HALT (Highly Accelerated Life Test) in two key areas. First, a HALTidentifies failures by applying one stress at a time to determine the operational and destruct limits of the productfrom each stress source. The FMVT applies the stress sources simultaneously starting at service conditions andincreasing to a predetermined maximum test level. The second difference is in the analysis techniques. With aHALT, the margin between the service conditions and the operational and destruct limits provides the basis foranalyzing the relevance of the failure modes. With the FMVT, we examine the number of failure modes, thetime to failure and the relative distribution in time of the failures to determine which failures to address and thematurity of the design.

With the FMVT, the testing is conducted on a single sample. The analysis is not statistical but is designedto check two assumptions. First, that the design is capable of producing a viable product for the environmentsapplied. Second, that a good design and fabrication of the product would last for a long period of time under allof the stresses that it is expected to see and would accumulate stress damage throughout the product in a uniformway, so that when one feature fails, the rest of the product’s features are near failure. Therefore, the hypothesisof the test is this: the product will last for a long period of time under all stress conditions and will then exhibitmultiple diverse failures throughout the product (see Figure 1). The hypothesis is rejected if failures occur earlyor if they occur isolated in time relative to the bulk of the failures (see Figure 2).

Copyright c© 2004 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2004; 20:115–120

ACCELERATED RELIABILITY QUALIFICATION IN AUTOMOTIVE TESTING 117

Unique Failure Modes of a Mature Design

0 100 200 300 400 500

Time (min)

DM = 0.02

Figure 1. Hypothesized progression of failures

Unique Failure Modes

0 100 200 300 400 500

Time (min)

1 2 3 4 5 6 78910

DM = 0.42

Figure 2. Hypothesis rejected

The test is set up with the level one (1) stresses set at service conditions. If the hypothesis is correct, that theproduct is accumulating stress damage throughout the product in a uniform way, then at level one the rate ofstress damage will be uniform. Level ten (10) of the test is set up with each stress source raised to a destructlimit or a change in the physics of failure. For example, the maximum temperature would not be raised abovethe glass transition point of a plastic part, and the voltage would not be raised beyond the electrical breakdownlimit of key components. The destruct limit of each stress is defined as the stress level that will cause failure inonly a few cycles (less than 1h of exposure) without changing the physics of the failure. Because the stressesat level ten (10) are all set to destroy at a short period of time the rate of stress damage is uniform (one life ofdamage is accumulated in less than 1h of exposure).

If the hypothesis of the test is correct (that uniform stress damage accumulation occurs in the product underservice conditions) and the tenth (10th) level is set with all stress sources causing failure in a short period oftime, then the rate of damage accumulation should remain uniform from level one (1) through level ten (10).If a failure mechanism is accumulating damage faster then the rest of the design at or near service conditions,then that failure mechanism will exhibit the failure well before the rest of the design fails. In other words, ifa failure occurs earlier than the rest of the failures, the hypothesis is rejected and a weak location (location offaster damage accumulation) has been identified.

Copyright c© 2004 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2004; 20:115–120

118 A. PORTER

Unique Failure Modes

0 100 200 300 400 500

Time (min)

1 2 3 4 5 6 78910

DM = 0.42

Tmin Tmax

Tave

Figure 3. Failure mode progression

From the formulation of this hypothesis, a quantification can be made. Since the time to the first failure andthe overall spread of the failures indicates the acceptance or rejection of the hypothesis the ‘maturity’ of thedesign can be quantified as2

DM = Tave/Tmin

where DM is the design maturity, Tave is the average time between failures after the first failure, and Tmin is theminimum time to failure.

Tave = ((T2 − Tmin) + (T3 − T2) + (T4 − T3) . . . (Tmax − Tn))/(count − 1)

Tave = (Tmin − Tmax)/(count − 1)

where Tmax is the maximum time to failure, Tx is the time to failure of failure number x, and count is the countof failures (see Figure 3).

Another way to view this is that DM is the average potential for improvement by fixing one failure.DM therefore provides a means of quantifying how well the product met the hypothesis.

Since the key information for the client was ‘how will it fail and which failures are relevant’, evaluating whichfailures should be addressed is critical. The question then is, what level of DM would be considered ‘good’?

Note that for a perfect product, one that lasted a long period of time and then all the failures occurred at once,DM would equal zero.

Tave = 0

Tmin = 100

DM = 0/100 = 0

This means that there is no room for improvement, fixing one failure mode would not improve the product. All ofthe failures would have to be addressed, which would logically require a complete redesign or a completelydifferent technology.

Empirically we have measured the DM for a wide variety of products at different levels of development andproduction. Products that have very high reliability have a DM very close to zero (no room for improvement).The threshold from the empirical data appears to be DM = 0.10, or about a 10% potential for improvement byfixing one failure.

This threshold provides the first reference point for determining which failures to fix. If DM > 0.1, thenfix the failures to bring the DM to be less than 0.1. By removing the earliest failure from the data set and

Copyright c© 2004 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2004; 20:115–120

ACCELERATED RELIABILITY QUALIFICATION IN AUTOMOTIVE TESTING 119

Level 1

Level 4

Level 7

Level 10

Failu

re 1

Failu

re 2

Failu

re 3

Failu

re 4

Failu

re 5

Failu

re 6

Failu

re 7

Failu

re 8

Failu

re 9

Failu

re 1

0

Failu

re 1

1

Failu

re 1

2

Failu

re 1

3

Failu

re 1

4

Failu

re 1

5

Failu

re 1

6

Failu

re 1

7

Failu

re 1

8

Failu

re 1

9

Failu

re 2

0

Failu

re 2

1

Failu

re 2

2

Failu

re 2

3

Failu

re 2

4

0

1

2

Figure 4. Histogram of failures

re-calculating DM, a predicted design maturity is found. The predicted design maturity that is less than 0.1indicates the technological limit: an assumption about how good the product is can reasonably be made withouta fundamental change in its technology. All failure modes that were removed from the data set to achieve a DMless than 0.1 should be addressed.

It should be noted that removing failure modes requires a design change. The design must be re-tested afterthe design change is made to ensure the improvement to the design.

There is another analysis technique on the data produced by the FMVT that is useful. In a statistical test withmultiple samples, it is easy to see if there is a non-repeatable failure mode in one sample caused by a flaw, orvariance in the fabrication of the one sample. With an FMVT, the sample size is often one. Detecting an outlierwhen there is no statistical analysis requires a different approach. When a failure occurs, the failed locationis either repaired or replaced. The test is continued. If the failure is inherent to the design it will be repeated,and repeated faster as the stress levels are elevated. If, however, the failure is due to an incorrect fabrication orassembly the failure will not be repeated. A histogram of the incident rate as a function of failure modes andstress levels provides a clear indication of design inherent (repeatable) failures and outliers (non-repeatable)failures (see Figure 4).

Copyright c© 2004 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2004; 20:115–120

120 A. PORTER

Using techniques like FMVT provides a means of breaking the tyranny of statistics over durability andreliability testing in a competitive business climate.

REFERENCES

1. Porter A. Life estimating techniques for failure mode identification testing methods. Technical Report SAE 2002-01-1174, 2002.

2. Entela. Design maturity algorithm. U.S. Patent No. 6,247,366.

Author’s biography

Alexander (Alex) J. Porter is the Engineering Development Manager for the Testing and Engineering Divisionof Entela, and has been with the company since 1992. Since 1996, he has been developing acceleratedtesting methods for mechanical components and systems. Alex has three patents related to accelerated testingequipment. His work in the past has included implementation of FEA in a laboratory setting and development ofa thermal management system for an advanced data acquisition package developed by NASA’s Drydon FlightResearch Facility. Alex is a member of SAE, the IEEE Reliability Society, IEST and SMTA. He holds a BS inAircraft Engineering and an MS in Mechanical Engineering, both from Western Michigan University.

Copyright c© 2004 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2004; 20:115–120