users.rowan.eduusers.rowan.edu/~shreek/share/theses/russell/thesis... · web views. huang and k. k....

AN INTELLIGENT VALVE FRAMEWORK FOR INTEGRATED SYSTEMS

HEALTH MANAGEMENT ON ROCKET ENGINE TEST STANDS

byMichael Russell

A Thesis

Submitted in partial fulfillment of the requirements of theMaster of Science Degree

ofThe Graduate School

atRowan University

October 2010

Thesis Chair: Shreekanth Mandayam, Ph.D.

© 2010 Michael Russell

ABSTRACT

Michael J. RussellAN INTELLIGENT VALVE FRAMEWORK FOR INTEGRATED SYSTEMS

HEALTH MANAGEMENT ON ROCKET ENGINE TEST STANDS2009/2010

Shreekanth Mandayam, Ph.D.Master of Science in Engineering

Intelligent sensors can play a critical role in the monitoring of complex test systems such

as those used to inspect rocket engine components. Such sensors have the capability not

only to provide raw data, but also indicate the data’s reliability and its effect on system

health at various levels in the system hierarchy. A major concern at NASA-Stennis Space

Center (SSC) in Mississippi is the failure of critical components in the rocket engine test

stand during a test cycle. Test cycles can run for extended periods of time and it is nearly

impossible to perform maintenance on mission critical components once testing has

commenced. Valves play a critical role in rocket engine test stands, because they are

essential for the cryogen transport mechanisms that are vital to test operations. Sensors

that are placed on valves monitor the pressure, temperature, flow-rate, valve position and

any other features that are required for diagnosing their functionality. Integrated systems

health management (ISHM) algorithms have been used to identify and evaluate

anomalous operating conditions of systems and sub-systems (e.g. valves and valve-

components) on complex structures such as rocket test stands. In order for such

algorithms to be useful, there is a need to develop realistic models for the most common

and problem-prone elements. Furthermore, the user needs to be provided with efficient

tools to explore the nature of the anomaly and its possible effects on the element as well

as its relationship to overall system state.

This thesis presents the development of an intelligent valve framework that is capable

of tracking and visualizing events of the large linear actuator valve (LLAV) in order to

detect anomalous conditions. Specifically, the research work presented in this thesis

describes a diagnostic process that receives and stores incoming sensor data; performs

calculation of operating statistics; compares with existing analytical models; and,

visualizes faults, failures, and operating conditions in a 3D GUI environment. A suite of

diagnostic algorithms have been developed that can detect anomalous behavior in the

valve and other system components of the rocket engine test stand. The framework

employs a combination of technologies including a DDE data transfer protocol, auto-

associative neural networks, empirical and physical models and virtual reality

environments. The diagnostic procedure that is developed has the ability to be integrated

into existing ISHM systems and reduce information overload in the typically crowded

environments of complex system control rooms. The augmentation to ISHM capabilities

that is presented in this thesis can provide significant benefits for ground-based spacecraft

monitoring and has the potential to be ultimately adapted for providing on-board support

for spacecraft.

ACKNOWLEDGEMENTS

The support of my MS program by the NASA Graduate Student Researchers Program

(GSRP) award No. NNX08AV98H in 2008 and 2009 is gratefully acknowledged. The

research work presented in this thesis was also supported by NASA Stennis Space Center

under Grant/Cooperative Agreement No. NNX08BA19A.

I also acknowledge Dr. Shreekanth Mandayam for being a great advisor and

providing the funding to get me through my master's program. To Dr. Schmalzel and Dr.

Merrill, I thank you for your guidance as part of my thesis committee. To Hak attack,

Fillman, Metin, Rane, Elwell and Freddie for helping me pass undergrad and making life

bearable during those all-nighters. To Will and Steven for being my best non-nerd

friends through college.

I would also like to thank my family who have supported me in my academic

journey. My mom and dad for always encouraging me to push myself in life and faith.

My siblings and sibling-in-laws for always being there for me. My grandparents for

supporting me in my internship to NASA which started this research.

In Memoriam: Dr. Robert (Bob) Field was one of the many engineers at Stennis

Space Center that contributed to the development of improved system models—one of

which is a core element in the intelligent valve. A Mechanical Engineer adept at thermal

system design and analysis, he brought a depth of experience and insight gained from his

many years at Pratt-Whitney designing turbomachinery blades and solving other equally

complex problems. At NASA, he applied his deep understanding of thermal systems

design and analysis to many facets of test stand design and optimization. In addition to

his thermal technical expertise, he was the leader of many a stimulating conversation into

3

the finer—and fringier—points of the enterprises of engineering, science, and the

unknown. He was always ready to talk to young engineers and students. Bob retired from

NASA in October 2009 and passed away in February 2010.

In memory of Gladys Russell and William Kolb, the best grandparents, parents

and spouses I have ever known.

4

TABLE OF CONTENTS

Acknowledgements...........................................................................................................iiiList of Figures..................................................................................................................viiList of Tables....................................................................................................................xii

CHAPTER 1: INTRODUCTION....................................................................................1

1.1 APPLICATIONS............................................................................................................31.2 MOTIVATION..............................................................................................................41.3 OBJECTIVES................................................................................................................61.4 SCOPE.........................................................................................................................71.5 ORGANIZATION..........................................................................................................71.6 EXPECTED CONTRIBUTIONS.......................................................................................8

CHAPTER 2: BACKGROUND.......................................................................................9

2.1 HEALTH ANALYSIS.....................................................................................................92.2 FRAMEWORK FOR HEALTH ANALYSIS.....................................................................102.3 DESIGN AND TRADE STUDIES..................................................................................112.4 FAILURE MODE ANALYSIS.......................................................................................132.5 CBM TESTING, DATA COLLECTION, AND DATA ANALYSIS....................................202.6 ALGORITHM DEVELOPMENT - DIAGNOSTICS...........................................................21

2.6.1 Preprocessing and Feature Extraction.............................................................232.6.2 Techniques for Diagnostics..............................................................................24

2.7 ALGORITHM DEVELOPMENT - PROGNOSTICS...........................................................322.8 RELIABILITY CENTERED MAINTENANCE.................................................................402.9 SYSTEM IDENTIFICATION TECHNIQUES....................................................................41

2.9.1 Autoregressive Models......................................................................................432.9.2 Kalman Filters..................................................................................................43

CHAPTER 3: APPROACH............................................................................................45

3.1 FAILURE MODES......................................................................................................463.2 INTELLIGENT VALVE FRAMEWORK..........................................................................48

3.2.1 Data Acquisition...............................................................................................493.2.2 Preprocessing...................................................................................................513.2.3 Failure Mode Detection and Diagnosis...........................................................523.2.4 Valve Operational Statistics.............................................................................523.2.5 Auto-associative Neural Networks for Sensor Validation................................55

v

3.2.6 Thermal Modeling............................................................................................593.2.7 Adaptive Thresholding......................................................................................60

3.3 PROGNOSTIC SURVEY...............................................................................................633.4 DIAGNOSTIC PROCESS..............................................................................................63

CHAPTER 4: RESULTS................................................................................................68

4.1 DIAGNOSTIC VALIDATION DATA.............................................................................684.1.1 Thermal Model Data.........................................................................................684.1.2 Sensor Validation Data.....................................................................................694.1.3 Adaptive Threshold Data..................................................................................70

4.2 THERMAL MODEL VALIDATION...............................................................................714.2.1 Thermal Modeling............................................................................................724.2.2 Simulation Metrics............................................................................................91

4.3 SENSOR VALIDATION...............................................................................................944.4 ADAPTIVE THRESHOLD..........................................................................................1184.5 VALVE STATISTICS.................................................................................................1314.6 HEALTH VISUALIZATIONS......................................................................................1324.7 PROGNOSTICS.........................................................................................................1344.8 PROGNOSTICS DATA...............................................................................................134

4.8.1 Canonical Data...............................................................................................1344.8.2 LLAV Data......................................................................................................136

4.9 PROGNOSTIC PERFORMANCE..................................................................................1364.10 DIAGNOSTIC PROCESS..........................................................................................150

CHAPTER 5: CONCLUSIONS...................................................................................154

5.1 SUMMARY OF ACCOMPLISHMENTS........................................................................1545.2 RECOMMENDATIONS FOR FUTURE WORK..............................................................157

References.......................................................................................................................159

vi

LIST OF FIGURES

Figure 1 - Integrated approach for system health analysis.............................................................11

Figure 2 - The four types of failure mode and effect analysis (FMEA).........................................15

Figure 3 - Reliability analysis procedure for bottom-up and top-down FMEA approaches..........17

Figure 4 - System decomposition for CBM testing, data collection, and data analysis.................20

Figure 5 - Diagnostic and Prognostic Flowchart............................................................................23

Figure 6 - Model-based and Data-driven diagnostic techniques....................................................26

Figure 7 - Approaches for prognosis..............................................................................................35

Figure 8 - The system identification loop.......................................................................................42

Figure 9 - LLAV with regions of interest labeled..........................................................................46

Figure 10 - Prioritization of LLAV failure modes (see Equations 2.1 and 2.2

for y-axis calculation) ....................................................................................................................48

Figure 11 - System level flowchart of the Intelligent Valve framework........................................49

Figure 12 - Health analysis framework for the Intelligent Valve...................................................49

Figure 13 - Valve statistics algorithm.............................................................................................54

Figure 14 - Training method for auto-associative neural networks for sensor validation..............58

Figure 15 - Adaptive threshold algorithm for designing and choosing ARMA models.................61

Figure 16 - Adaptive threshold algorithm simulation on real-time data.........................................62

Figure 17 - Intelligent Valve database schema...............................................................................64

Figure 18 - Software framework for the Intelligent Valve framework...........................................67

Figure 19 - MTTP Trailer used for validating sensor faults...........................................................70

Figure 20 - Simulation data using thermal modeling for base run.................................................73

Figure 21 - Data acquisition setup for thermal modeling fault detection.......................................74

Figure 22 - Simulation data using thermal modeling for faulty connections in Tustin

amplifier input.................................................................................................................................75

Figure 23 - Fault classification using thermal modeling for faulty connections in Tustin

amplifier input.................................................................................................................................75

Figure 24 - Simulation data using thermal modeling for amplifier power downs and

Tustin input disconnections............................................................................................................76

vii

Figure 25 - Fault detection using thermal modeling for amplifier power down and

Tustin input disconnection..............................................................................................................77

Figure 26 - Simulation data using thermal modeling for faulty input connections in

the digitizer.....................................................................................................................................78

Figure 27 - Fault detection using thermal modeling for amplifier power down and

Tustin input disconnection..............................................................................................................78

Figure 28 - Simulation data using thermal modeling for simulated frost insulation test 1.............79

Figure 29 - Fault detection using thermal modeling for frost insulation test 1..............................80

Figure 30 - Simulation data using thermal modeling for simulated frost insulation test 2.............81

Figure 31 - Fault detection using thermal modeling for frost insulation test 2..............................81

Figure 32 - Data acquisition modified setup for thermal modeling fault detection........................82

Figure 33 - Simulation data using thermal modeling for temperature junction reference errors.. .83

Figure 34 - Fault detection using thermal modeling temperature for junction reference errors.....83

Figure 35 - Simulation data using thermal modeling for thermocouple and power

disconnections.................................................................................................................................84

Figure 36 - Fault detection using thermal modeling for thermocouple and power

disconnections.................................................................................................................................85

Figure 37 - Simulation data using thermal modeling for thermocouple disconnections

and shorts........................................................................................................................................86

Figure 38 - Fault detection using thermal modeling for thermocouple disconnections

and shorts........................................................................................................................................86

Figure 39 - Simulation data using thermal modeling for transmitter power failures.....................87

Figure 40 - Fault detection using thermal modeling for transmitter power failures.......................88

Figure 41 - Simulation data using thermal modeling for unaccounted thermocouple junctions....89

Figure 42 - Fault detection using thermal modeling for unaccounted thermocouple junctions.....89

Figure 43 - Comparison of predicted and actual frost line.............................................................90

Figure 44 - Example of a hard fault................................................................................................95

Figure 45 - Example of a soft fault.................................................................................................95

Figure 46 - Example dataset from LLAV and downstream pressure sensor..................................96

Figure 47 - Hard fault detection using AANN...............................................................................97

Figure 48 - Soft fault detection by AANN.....................................................................................97

viii

Figure 49 - Fault detection of a simulated hard fault in a pressure sensor.....................................98

Figure 50 - Fault detection of a soft fault in a pressure sensor.......................................................99

Figure 51 - Detection of a simulated disconnect in a pressure transducer...................................100

Figure 52 - Legend for AANN estimations: (a) Top estimation plots and (b) bottom

error plots......................................................................................................................................101

Figure 53 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors

under normal operating conditions...............................................................................................101

Figure 54 - AANN Estimation for PE-1143-GO and PC1 pressure sensors under

normal operating conditions.........................................................................................................102

Figure 55 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors

under normal operating conditions...............................................................................................102

Figure 56 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with

hard fault in PE-1143....................................................................................................................104

Figure 57 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with hard

fault in PE-1143............................................................................................................................104

Figure 58 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD pressure

sensors with hard fault in PE-1143...............................................................................................105


with level shift in PE-1143-GO....................................................................................................107

Figure 60 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with level

shift in PE-1143-GO.....................................................................................................................107


with level shift in PE-1143-GO....................................................................................................108


with noise in PC1..........................................................................................................................110

Figure 63 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with noise in PC1...110


with noise in PC1..........................................................................................................................111

Figure 65 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with

noise in VPV-1139-FB.................................................................................................................112

ix

Figure 66 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with noise

in VPV-1139-FB...........................................................................................................................113


with noise in VPV-1139-FB.........................................................................................................113

Figure 68 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure

sensors with simultaneous faults in PE-1143-GO and PC1.........................................................115

Figure 69 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with

simultaneous faults in PE-1143-GO and PC1...............................................................................115

Figure 70 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve

sensors with simultaneous faults in PE-1143-GO and PC1..........................................................116

Figure 71 - Set point transitions for adaptive thresholding testing...............................................119

Figure 72 - Set point transition #1 with fault detection while operating in :

(a) normal OS and (b) faulty OS...................................................................................................120

Figure 73 - Set point transition #2 with fault detection while operating in:










Figure 78 - Average fault values for different parameters of the ARMA model

thresholding method over all tests................................................................................................128

Figure 79 - Training data with final threshold fit.........................................................................129

Figure 80 - Fault detection of simulated obstruction fault using adaptive thresholding..............130

Figure 81 - Frost line visualization of LLAV...............................................................................133

Figure 82 - Cross sectional and exploded view with flow and position visualizations................133

Figure 83 - Frost line visualization of LLAV with thermocouple values.....................................134

Figure 84 - Linear equation with 0 mean and 1 variance.............................................................135

x

Figure 85 - Linear time series with 0 mean and 10 variance........................................................135

Figure 86 - Original model time series.........................................................................................136

Figure 87 - AR prediction of first time signal at 1 prediction step and SNR = 25dB..................137

Figure 88 - AR prediction of first time signal at 5 prediction steps and SNR = 25dB.................137

Figure 89 - AR prediction of first time signal at 5 prediction step and SNR = -5dB...................138

Figure 90 - AR MSE performance on 0 mean, 1 variance signal.................................................138

Figure 91 - ARMA prediction of first time signal at 1 prediction step and SNR = 25dB............139

Figure 92 - ARMA prediction of first time signal at 1 prediction step and SNR = -5dB.............139

Figure 93 - ARMA prediction of first time signal at 5 predictions steps and SNR = -5dB.........140

Figure 94 - ARMA MSE performance on 0 mean, 1 variance signal..........................................140

Figure 95 - Kalman filter prediction of first time signal at 1 prediction step and

SNR = 25dB..................................................................................................................................141

Figure 96 - Kalman filter prediction of first time signal at 5 prediction steps and

SNR = 25dB..................................................................................................................................141

Figure 97 - Kalman filter prediction of first time signal at 5 prediction steps and

SNR = -5dB..................................................................................................................................142

Figure 98 - Kalman filter MSE performance on 0 mean, 1 variance signal.................................142

Figure 99 - Original time series model #2....................................................................................143

Figure 100 - AR MSE performance on 0 mean, 10 variance signal.............................................144

Figure 101 - ARMA MSE performance on 0 mean, 10 variance signal......................................144

Figure 102 - Kalman filter performance on 0 mean, 10 variance signal......................................145

Figure 103 - ARX prediction of the LLAV data to 30 time steps................................................146

Figure 104 - Performance for ARX model based on LLAV data.................................................147

Figure 105 - ARMAX prediction of the LLAV data to 30 time steps..........................................147

Figure 106 - Performance for ARMAX model based on LLAV data..........................................148

Figure 107 - Kalman prediction of the LLAV data to 30 time steps............................................148

Figure 108 - Performance for Kalman filter based on LLAV data...............................................149

Figure 109 - Intelligent Valve statistics tab..................................................................................151

Figure 110 - Intelligent Valve thermocouple tab..........................................................................152

Figure 111 - Intelligent Valve setup tab.......................................................................................153

xi

LIST OF TABLES

Table 1 - An example morphological matrix of a redesigned rail bogie........................................12

Table 2 - Description of the four types of failure mode and effect analysis (FMEA)....................15

Table 3 - Possible values of the parameters used in a FMEA........................................................18

Table 4 - Diagnostic algorithms from the literature.......................................................................27

Table 5 - Prognostic algorithms from the literature........................................................................35

Table 6 - Failure modes and effects for LLAV..............................................................................47

Table 7 - Thermocouple types and ranges......................................................................................51

Table 8 - Data server class interface...............................................................................................66

Table 9 - Adaptive threshold simulation parameters......................................................................71

Table 10 - Physical parameter obtained from least square optimization curve

fit of base run..................................................................................................................................72

Table 11 - Performance metrics for faulty connection in amplifier input......................................91

Table 12 - Performance metrics for amplifier power down and Tustin input

disconnect.......................................................................................................................................91

Table 13 - Performance metrics for input disconnection on the digitizer......................................91

Table 14 - Performance metrics for frost insulation test 1.............................................................91

Table 15 - Performance metrics for frost insulation test 2.............................................................92

Table 16 - Performance metrics for temperature junction reference error.....................................92

Table 17 - Performance metrics for thermocouple and power disconnection................................92

Table 18 - Performance metrics for thermocouple disconnections and shorts...............................92

Table 19 - Performance metrics for transmitter power and failure.................................................93

Table 20 - Performance metrics for unaccounted thermocouple junction......................................93

Table 21 - Average performance metrics for all thermocouple fault tests.....................................93

Table 22 - Performance metrics for fault detection using AANN under normal

operating conditions......................................................................................................................103

Table 23 - Performance metrics for fault detection using AANN with injected

hard fault in PE-1143-GO.............................................................................................................105

xii


level shift fault in PE-1143-GO....................................................................................................108


noise in PC1..................................................................................................................................111

Table 26 - Performance metrics for fault detection using AANN with noise

in VPV-1139-FB...........................................................................................................................114

Table 27 - Performance metrics for fault detection using AANN with simultaneous

faults in PE-1143-GO and PC1.....................................................................................................116

Table 28 - Operating Statistics for LLAV....................................................................................132

xiii

GLOSSARY OF TERMS

1. Health Management - A comprehensive system that detects, isolates, and quantifies faults as well as predicts future failures in an engineering system

2. Condition based maintenance - The use of machinery run-time data to determine the machinery condition and hence its current fault/failure condition, which can be used to schedule required repair and maintenance prior to breakdown

3. Prognostics and health management - The prediction of future failure conditions and remaining useful life of a system, subsystem, or component

4. Reliability centered maintenance - The process that is used to determine the most effective approach to maintenance

5. Failure Conditions - States of components and subsystems that are indicative of a fault occurring in the overall system.

6. Dimensionality Reduction - The process of reducing the number of random variables under consideration in order to create a more accurate set of feature vectors.

7. Fuzzy Logic - A form of multi-valued logic derived from fuzzy set theory to deal with reasoning that is approximate rather than accurate

8. Intelligent Component - A component in a system that relays not only raw data, but some sort of analysis on the data, e.g. FFT, DSP, moving average, fault and failure conditions, etc.

9. Artificial Neural Network - A mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks. It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation.

10. Integrated Systems Health Management - a set of system capabilities that in aggregate perform: determination of condition for each system element, detection of anomalies, diagnosis of causes for anomalies, and prognostics for future anomalies and system behavior

xiv

11. Fault Diagnosis - Detecting, isolating, and identifying an impending of incipient failure condition -- the affected component (subsystem, system) is still operational even though at a degraded mode.

12. Fault Diagnosis - Detecting, isolating, and identifying an impending of incipient failure condition -- the affected component (subsystem, system) is still operational even though at a degraded mode.

13. Failure Diagnosis - Detecting, isolating and identifying a component (subsystem, system) that has ceased to operate.

14. Fault Detection - Detection of the occurrence of faults in the functional units of the process, which lead to undesired or intolerable behavior of the whole system.

15. Fault Isolation - Localization (classification) of different faults.

16. Fault Analysis or Identification - Determination of the type, magnitude and cause of the fault

17. Failure modes effects and criticality analysis (FMECA) - A procedure in product development and operations management for analysis of potential failure modes within a system for classification by the severity and likelihood of the failures.

xv

CHAPTER 1: INTRODUCTION

As system complexity increases, the amount of data required to monitor system failures

also increases. Originally, a human operator could view the raw time series data to find

sensor faults that could be traced back to a root cause in the system. In modern day

systems, however, the increase in sensor data can make it difficult if not impossible for a

human operator to find anomalies in these systems in a timely fashion 1. Therefore,

reliability engineers are deploying automated algorithms that detect failure modes in

complex, dynamic systems. The goal of these algorithms has been extended from

detecting threshold violations in the sensors to identifying and quantifying the

degradation of health in a system and even predicting faults before they occur.

Numerous techniques have been developed with help from extensive research and

funding being put into the field of health analysis. The United States military has taken

particular interest in health analysis in order to provide their troops with robust and

reliable systems. Military studies found that maintenance protocols were based on a

schedule rather than a degradation of performance. The scheduled maintenance leads to

many components being replaced before their operational lifetime had ended. Health

analysis allows for preventive maintenance to be performed based on the current health

1

state of the component and has considerable cost benefits while also keeping operators of

the machines safe.

In any system, the proper health analysis technique must be determined and

depends on several design parameters such as application, severity, accuracy, historical

data, constraints, deadlines, and complexity of the physical dynamics of the system. For

instance, in a manufacturing plant certain critical components can cause a shutdown for

days. The shutdown can cost considerable delays in the shipping of the manufactured

products. Such systems would require a highly accurate algorithm such as a physics

model, but these algorithms also take the longest amount of time and cost to develop.

In the realm of health analysis, three major technologies have arisen: Condition

Based Maintenance, Prognostics and Health Management, Reliability Centered

Maintenance and Integrated Systems Health Management. Condition Based Maintenance

(CBM) is defined as “the use of machinery run-time data to determine the machinery

condition and hence its current fault/failure condition, which can be used to schedule

required repair and maintenance prior to breakdown” 2. Prognostics and Health

Management (PHM) refers to the prediction of future failure conditions and remaining

useful life of a system, subsystem, or component. Integrated Systems Health

Management (ISHM) “describes a set of system capabilities that in aggregate perform:

determination of condition for each system element, detection of anomalies, diagnosis of

causes for anomalies, and prognostics for future anomalies and system behavior” 3.

2

Reliability centered maintenance (RCM) "is the process that is used to determine the

most effective approach to maintenance" 4.

As systems grow in size and complexity, there will be a need to develop

algorithms that have higher accuracy, are more general, and have longer prediction

intervals than current system health analysis.

1.1 Applications

Diagnostics and prognostics make up the core components of the health analysis

framework. These two technologies are not limited to just engineering, but are also used

in medical, and business applications. While the goals of the analysis may be different,

the techniques used in the diverse fields are often the same.

The medical field uses diagnostics to try and determine the health of a patient, and

the disease that is affecting the patient. Once a diagnosis has been made, remedial

procedures can be created to try and help the patient as much as possible. Three of the

methods that are used by medical professionals are exhaustive, algorithmic, and pattern-

recognition. The exhaustive method uses every possible question and runs all possible

tests in order to create the most comprehensive diagnosis possible. The algorithmic

method follows steps from a proven strategy to diagnosis the disease based on the

symptoms the patient is going through. The final method, pattern-recognition, uses past

experience to recognize a pattern of clinical characteristics in order to diagnosis the

3

patient. While the procedures are different for each method, the goal of finding the

disease and coming up with a treatment based on the symptoms and test data available is

the same 5.

The global economy is constantly in a state of flux, with company’s stocks rising

and falling every day. The ability of an investor to predict these changes would result in

success for his company. Therefore, algorithms are being designed that attempt to

analyze, model, and even forecast the stock market in an attempt to find trends that will

tell investors when it is the best time to buy or sell their stocks. Financial forecasting is

also used by top management for planning and implementing long-term strategic

objectives. The methods used by forecasters usually rely on probabilistic models such as

regression and Markov models. The main drawback of most of these models is the

difficulty in taking into account all the variables as well as the functions or relationships

of those variables that contribute to something as complex as the global market 6.

1.2 Motivation

The National Aeronautics and Space Administration (NASA) was formed in 1958 and

has quickly established itself as a worldwide leader for air and space research. The

accomplishments of the public space agency resulted in a number of firsts including an

interplanetary flyby, pictures from another planet, and manned landings on the moon, the

assembly the launch of a space station. Since its inception, NASA has placed an

4

emphasis on the safety of its astronauts during their voyages into space. However, even

with safety procedures and equipment, the manned space flight program has suffered

several catastrophic loses including the crews of Apollo 1, STS-51-L, and STS-107.

Since the two space shuttle disasters, NASA has focused research on the development of

an Integrated Systems Health Management (ISHM) platform to ensure the highest level

of safety possible for future endeavors in space 78.

NASA defines Integrated System Health Management (ISHM) as “a capability

that focuses on determining the condition (health) of every element in a complex System

(detect anomalies, diagnose causes, prognosis of future anomalies), and provide data,

information, and knowledge (DIaK) - not just data - to control systems for safe and

effective operation9”. The vision of NASA is to start incorporating ISHM at the

beginning of the conceptual design until the end of the manufacturing cycle for future

missions. By allowing safety to influence conceptual design, engineers can catch

potential failures and anomalies in systems before they are fully designed.

By catching these flaws early enough, the best opportunity for costs savings can

be exploited at the earliest stages in development. The development and implementation

of ISHM in the complex systems designed by NASA can also create additional costs if

not applied correctly. Therefore, risk analysis tools are being created that find a balance

between cost, performance, safety and reliability throughout the system lifecycle.

5

NASA Stennis Space Center (SSC) in Mississippi is the location of one group

researching ISHM technologies. NASA-SSC’s primary responsibility is the testing of all

the rocket engines before they are launched from NASA Kennedy Space Center. This

includes the Space Shuttle main engine (SSME) and the new J-2X, which are both critical

to the success of their respective missions. While the SSME is being phased out, the J-

2X is a brand new engine in its first stages of testing. The engines require highly

combustible fluids such as liquid hydrogen and liquid oxygen to create thrust of up to

294,000 lbs 10. The engines are bolted down to massive superstructures and are fired for

the exact amount of time the engine stays lit during the live launch. To date, no shuttle

mission ever has been delayed or aborted due to an engine failure 11. To continue this

perfect track record, an ISHM module is being created for the newly renovated A-

Complex test stands.

One of the important aspects of the ISHM module is the determination of failures

and anomalies in the valves of the test stand. These valves are responsible for

maintaining a precise flow of cryogenic fluids needed to fuel a test article. The cost of

these test articles is extremely high and even a small discrepancy in the flow rate of

cryogenic fluids can cause catastrophic events. The cost to run a test program can be in

the millions of dollars and extended delays can cause the cancellation of an entire

program. The restrictive constraints placed upon the operation of the test stands requires

the test engineers to continually monitor the valve operations and, at the first sign of

6

degradation, repairs must be made quickly and efficiently. Currently human engineers

perform the analysis on the valve data, but the implementation of an intelligent

framework with algorithms to monitor the health of the valves in the system could help

give additional insight to the engineers at NASA-SSC. The valuable statistical,

diagnostic, and prognostic information introduced with such a framework could generate

advisories that, when combined with the domain expert’s opinions, would produce the

greatest accuracy in maintenance decisions.

1.3 Objectives

The objectives of this thesis are -

1. To design a framework for the detection of faults and failure modes in the large linear

actuated valve that are used on the rocket engine test stands at NASA-SSC.

2. To develop a diagnostic process that –

a. Receives and stores incoming sensor data;

b. Performs calculation of operating statistics;

c. Compares with existing analytical models; and,

d. Visualizes faults, failures, and operating conditions in a 3D GUI environment.

3. To develop a suite of diagnostic algorithms that can detect anomalous behavior in the

valve and other system components of the rocket engine test stand.

7

4. To expand the capability of the diagnostic algorithm to perform prognosis in specific

context.

1.4 Scope

The survey of current diagnostic and prognostic techniques focused on how to apply

these algorithms to specific algorithms and is presented in the background section of this

thesis. The steps of a health analysis framework are also presented in the background

section.

The development of these algorithms is defined in the approach section, with

specific applications to NASA-SSC's E-complex test stand. Particularly, the valve in

question is the Large Linear actuator valve, which is a critical component to the test

stands at NASSA-SSC. The algorithms are tested on both actual data from the rocket

engine test stands, as well as simulated data from forward analytic models.

1.5 Organization

This thesis is organized as follows. Chapter 1 provides introductory information on

NASA’s history and the motivation of the agency to develop an Integrated System Health

Management framework for its rocket engine test stands. Possible applications,

objectives, and expected contributions are also discussed. Chapter 2 provides a thorough

background on the development process of health analysis algorithms and frameworks.

An overview of the framework is given and then proceeding sections provide detailed

8

information for each step. Chapter 3 outlines the approach taken to develop diagnostic

and prognostic algorithms for the detection of anomalies in sensor data from the valves at

NASA-SSC. Chapter 4 is an account of the results of creating the functional database

and intelligent valve framework, following the premises outlined in Chapter 3. Chapter 5

is a summary of accomplishments presented in this thesis, as well as future research

recommendations.

1.6 Expected Contributions

This thesis will provide a detailed summary of existing methods for health analysis with

applications to the ISHM components used at NASA-SSC. It will also provide a

literature review of existing diagnostic and prognostic algorithms. A functional database

that utilizes neural networks will be integrated into the existing ISHM framework. This

database will detect alarms in the subsystems and components of the test stand. These

alarms can then be used for root cause analysis to pinpoint faults and failures in the

complex test stands.

This thesis will also show the approach and results of an intelligent valve

framework. There will be two modes that the framework will be used in: health analysis

algorithms and a diagnostic process. The diagnostic process, which is run in real-time

during tests, will be responsible for the capturing of operating statistics, thermal model

diagnostics, and a 3D model of the valves. The health analysis algorithms, which will be

9

run after a test series has completed, is responsible for the development and validation of

advanced diagnostic and prognostic algorithms for the determination of the remaining

useful life for the valves. These algorithms will eventually convey advisory information

to NASA engineers for maintenance options in the valves at the E-Complex test stand.

10

CHAPTER 2: BACKGROUND

The following section contains a summary of previous work performed in the area of

fault diagnosis, fault detection, and prognostics. A detailed method of the entire health

analysis framework will be given. Finally, a discussion of various system identification

techniques will be presented.

2.1 Health Analysis

As engineering systems have become more complex, the cost to maintain these systems

has also increased dramatically. Therefore, research in the area of system health analysis

has emerged over recent years to help alleviate the cost of these expensive machines.

The research has been split into two major areas: Condition-based maintenance (CBM)

and prognostics and health management (PHM) 2.

CBM focuses on the detection of faults in a system and then labeling a specific

component that caused the fault. This methodology replaces traditional scheduled

maintenance which commonly resulted in working parts bring replaced before their

useful life had expired.

PHM algorithms attempt to determine the remaining useful life of a system after a

fault has occurred. Knowing the remaining life of a system can minimize the downtime

risk for critical systems in manufacturing plants 12. As systems become more advanced,

physical modeling has become too expensive to develop in a timely fashion and can

11

become too specific to be useful in health management. Therefore, systems are broken

into smaller subsystems that can be modeled more easily. Ideally, these subsystems are

able to be modeled by first order physics equations. If this degree of complexity is not

sufficient, system identification techniques are used to model a system based on historical

data. These techniques and their application in system health management will be

explored in the following sections of this thesis.

2.2 Framework for Health Analysis

Modeling the entire health of a system can be very complex and is impractical in most

cases. Therefore, the approach of health analysis is broken up into different sections that

include systems, subsystems and components into a pipeline that streamlines the entire

process. While the input and output formats of the sections are defined, they are each

treated as a black box where only pertinent information is passed on to the next level of

the analysis. The entire pipeline can be seen in Figure 1 2 with a description of each

section following.

12

Figure 1 - Integrated approach for system health analysis.

2.3 Design and Trade Studies

The first step of the health analysis process is to examine the system from a top level, and

determine the best approach for each failure mode identified. In 2002, a formal

methodology was accepted by U.S. Department of Defense called integrated product and

process design (IPPD) 3. IPPD defines the following tasks:

Define the problem

Establish value

13

Generate feasible alternatives

Establish alternatives

Recommend a decision

The IPPD framework is applied to the system during the design phase. Its main

purpose is to provide guidance to the engineers designing the system. The IPPD uses a

morphological matrix that lists the functions of the system and proposes alternative

methods of accomplishing those functions. An example of a morphological matrix of the

redesign of a rail bogie can be seen in Table 1 13.

Table 1 - An example morphological matrix of a redesigned rail bogie.

Function Actuator SolutionsTo connect the

wheel-set and the carriage

Carriage spring BogieBogie with single-stage suspension

To allow the primary

suspensions simultaneously

working

Coaxial helical springs + shock

absorber

Helical springs working in

parallel + shock absorber

Helica springs working in

parallel + shock absorber

Pressure spring + rubber small

block

To reduce oscillations between the

bogie and the carriage frame

Helical springs working in

parallel with shock absorber

Coaxial

When allowing CBM/PHM to contribute to the design at an early stage, more

reliable systems can be built based on past experience of what failures occur most often

in what equipment. While the morphological table presents the best technology to

14

perform each function in a system, it is not always feasible in a budget to build a system

with the most state of the art components. Therefore, the morphological table must be

presented side-by-side with a quantitative analysis of the benefits of each component.

Decision analysis is a field that is well studied and provides several techniques to

quantify the options available to the design engineers. A mathematical model has been

developed for the selection of the best alternative attributes based on incomplete

preference information to asses attribute weights 14. There are various methods of

multiple attribute analysis model (MADM) which are ideal for quantifying the attributes

in the morphological matrix 15.

To completely satisfy the tasks of the design and trade studies phase, all design

aspects of a system must be chosen from the techniques described above. Final design

choices should be made only after expert opinions have been solicited or simulation

studies are performed 2. All design alternatives should be accompanied by some

technique of numerical rankings in order to best select the attributes which solve the

functions required by the system as well as stay within the budget constraints placed on

the system. After these choices have been made, the output of the design and trade

study section is a design of a system and subsystems which accomplish the task with the

greatest reliability possible. The next stage then analyzes these designs from a health

standpoint in order to understand the failure modes of the system.

15

2.4 Failure Mode Analysis

Understanding not only what component fails in a system, but why it fails is critical to

any health analysis platform. To perform complete health analysis, these failures must be

classified by their criticality in the system. The field of study has become known as

failure modes and effects analysis (FMEA), and many methods have been presented in

the literature. NASA Ames Research Center developed a failure mode mechanism

through clustering analysis. The analysis includes a statistical clustering procedure to

retrieve information on the set of predominant failures that a function experiences 16.

The Society of Automative Engineers (SAE) has also developed a FMEA

procedure specifically for the automotive industry. They split their approach and have

separate procedures for the design phase, as well as the manufacturing and assembly

phase. It contains recommendations for appropriate terms, requirements, ranking charts,

and worksheets. The SAE standard is not as general as the other mentioned

methodologies, which makes it only usable for the automotive industry. Therefore, the

work in the remainder of this thesis will focus on general standards that can be applied to

any health analysis framework 17.

The United States military developed a procedure for performing FMEA in

Military Procedure MIL-P-1629. The evaluation criteria of this standard determined the

effect of system and equipment failures. The criteria was extended to the Mil-Std-1629A

16

in order to add criticality analysis to the failure modes. NASA formally developed and

applied the 1629A method in the 1960's to improve the reliability of its space program.

The 1629A standard has become the most widely accepted method used through the

military and commercial industry 18. Even though 1629A is considered a standard, in

many applications it is applied more as a template that is altered and updated to meet the

needs of the project. For example, similar to the SAE standards, the design process is

separated into multiple phases such as System FMEA (SFMEA), Design FMEA

(DFMEA), Process FMEA (PFMEA), System FMEA (SFMEA). A diagram each is seen

in Figure 2 with a description following in Table 2 19.

17

Figure 2 - The four types of failure mode and effect analysis (FMEA).

Table 2 - Description of the four types of failure mode and effect analysis (FMEA).

Type Focus Objectives and Goals

System Minimize failure effects on the system

Maximize system quality, reliability, cost, and maintainability

Design Minimize failure effects on the design

Maximize design quality, reliability, cost, and maintainability

Process Minimize process failures on the total process (system)

Maximize the total process (system) quality, reliability, cost, maintainability, and productivity

18

Service Minimize service failures on the total organization

Maximize the customer satisfaction through quality, reliability, and service

To create a complete and thorough standard process that can be used in a wide

variety of applications, certain terminology has been defined in the Mil-Std-1629A

document in order to simplify the communication channels between design and FMEA

team. The overall objective of the FMEA process is to discover all of the ways a process

or product can fail. Failures occur not only because of design or manufacturing flaws,

but also by misuse of the product by the operator. That is why it is essential to

investigate all four types of FMEA; which leads to a study that follows a product from

concept and design, to the manufacturing and distribution. While these evaluations are

not guaranteed to be comprehensive, any customer complaints are able to be addressed

due to the understanding of the system based on the failure modes and effects that have

been analyzed 20.

A FMEA is a straightforward process that allows for a system to be broken down

into easily analyzed parts where failure modes are identifiable. A formal definition given

by NASA Lewis Research Center distinguishes three specific components as the

objective of a FMEA 21:

1. Analyze and discover all potential failures modes of a system

2. Effects these failures have on the system

19

3. How to correct and/or mitigate the failures or effects on the system

The effects of these failure modes can be more difficult to determine from a

system level. A design FMEA can be conducted by a bottom-up approach, where the

lowest level component is analyzed, or a top-down approach where an upper level failure

is chosen, then the lower level effects are analyzed. Figure 3 shows these two approach of

failure analysis 21.

Figure 3 - Reliability analysis procedure for bottom-up and top-down FMEA approaches.

Once the FMEA approach has been selected, failure modes are classified based on

a set of parameters including: severity, frequency of occurrence, and testability. It is

common for these criterion to be classified based on fuzzy values rather than numerical

20

values as seen in Table 3. The fuzzification of the values allows for the FMEA to be

performed on systems without large amounts of quantitative data of the faults of a

system. The study also identifies the symptoms that the system exhibits while under the

fault condition, as well as recommendations of the observers that can monitor and track

the fault as it occurs 2. The selection of observers to identify a fault may not always be a

physical sensor, but rather the features that can be extracted from data in order to build a

diagnostic algorithm. To identify these key components, domain experts must contribute

to the FMEA study, particularly those experts who have experience with the exact

components being used to perform a specific function in the system. After the

parameters are given values, the priority of each is listed in a scale or table in order to

identify the keys components of a system where health diagnostic algorithms should be

developed. Once these algorithms are developed, the system is reevaluated by the same

parameters, but with improved testability and occurrence scores for those failure modes

which have been addressed 20.

Table 3 - Possible values of the parameters used in a FMEA.

Parameter Possible Values

Severity Catastrophic Critical Marginal Minor

Frequency ofoccurrence Likely Probable Occasional Unlikely

Testability Comments based on domain expert's knowledge

21

Two downsides arise from the use of fuzzy logic in the FMEA process. The first

is that there is no quantitative priority number that can be deduced from the fuzzy values.

A very straight forward solution, and the most commonly applied, is to defuzzify the

values into a scalar range from 1-10 for each of the parameters. The resulting values are

then multiplied together to form a priority number, commonly known as the risk priority

number (RPN) 22. In the same manner as before, after a diagnostic model of failure

mode with the highest RPN is designed and verified, the RPN is readjusted based on the

newly evaluated occurrence and testability parameters. Eqs. 2.1 and 2.2 shows the

formula for both the RPN and the readjusted RPN 23.

RPN=Occurrence∗Severity∗Testab ility (2.1)

% Reduction∈RPN=RP N initial−RP N reduced

RP N initial(2.2)

Equation 1- (2.1) Risk priority number and (2.2) readjusted risk priority number formulas.

The other downside of the fuzzy FMEA approach is the lack of a biasing tool for

the parameters. For example, if a component has a high likely of occurrence, but low

severity and testability, and another component has high severity, but low occurrence and

testability; their priority may fall at exactly the same location in the FMEA. While in

some applications this priority scoring may be desired, some failure modes must be

identified based strictly on their severity to the system. Therefore, a criticality analysis is

added to the analysis which weights the parameters based on the applications goals and

22

design team concerns. The addition of the criticality parameter results in a ranking

system based on the severity classification of a failure mode, as well as the probability of

occurrence based on historical data. If there is no historical data, then a qualitative

approach must again be used, but the more desired approach is again to use the

quantitative number scaling used above. Based on the failure modes and effects

criticality analysis (FMECA) standard being used for the system, the scaling will changed

based on specifications put forth by the designer.

Failure mode and effects criticality analysis is a very important, but often

overlooked section of the health analysis framework. It may be partially due to the

amount of time, effort, and research that must be put into the collection and analysis of

data. Collaboration in a FMECA system is essential for it to be performed correctly,

particularly when there are varying systems that require advice from different domain

experts. Also, FMECAs should be done iteratively through the life a component to

guarantee that all failure modes are identifiable and recommended actions can be

performed in the result of a real failure when the product is being used by the customer 2.

2.5 CBM Testing, Data Collection, and Data Analysis

23

After the potential failure modes of a system have been identified, the next step in the

health analysis framework is the design of the required instrumentation and data-

acquisition system in order to gather baseline data under real operations. One system

level approach to perform the design task is to decompose a system into six distinct parts.

This hierarchy, developed by Pennsylvania State University's Applied Research

Laboratory, allows for data acquisition to be performed on the lowest level before the

system is even constructed. The hierarchy is comprised of areas of focus that can be

examined by multiple level of engineers and scientists 24:

Figure 4 - System decomposition for CBM testing, data collection, and data analysis.

By dividing a system into these 6 specific levels, the amount of health analysis

algorithms is broadened to support many different fields of engineering. For example, by

analyzing the material of a subsystem, non-destructive evaluation can be used in order to

determine degradation whereas at a system level it would be more difficult to see the

applicability of such techniques. Also studies have been performed on how materials

24

degrade under hostile conditions 2526. These previous studies can be applied to different

CBM applications in order to minimize the amount of redundant research being

performed.

Another method, developed by at the University of South Carolina (USC), relies

on historical data and a relational database to tag key anomalies while maintenance is

being performed. The historical data is retrieved from Maintenance Management

Systems (MMS), which are more traditional maintenance records that holds information

on faults and the repair actions performed. These systems are used by companies and

manufacturers in order to optimize and control the maintenance of its facilities 27. With

an abundance of fault and failure data in a MMS, a link can be built between itself and a

Health and Usage Monitoring System (HUMS) in order to monitor vehicle component

parameters. The system developed by USC attempts to create this data link by extracting

metadata from the MMS textual descriptions and combining it with the statistical analysis

performed by the HUMS. The integrated service benefits greatly from large amounts of

both qualitative and quantitative historical data; however, without a common data format

for both the MMS and HUMS, USC's MMS and HUMS link is very application specific

and difficult to apply to existing structures 25. Example implementations of the USC's

method can be found in 28.

2.6 Algorithm Development - Diagnostics

25

Once faults have been seeded, proper sensor instrumentation selected and data obtained,

algorithms must be developed in order to detect failure modes as early as possible.

Diagnosis is a subject studied not only for machine systems, but other disciplines such as

medicine, sciences, business, and finance [29-31]. While the application and objective of

each is different, the methodology of detecting anomalous conditions using appropriate

sensor data is the same. Due to the vast number of applications for diagnosis, this

research area has been well studied during recent years. The two areas of focus on fault

diagnostics are 2:

Fault Diagnosis: Detecting, isolating, and identifying an impending of incipient

failure condition -- the affected component (subsystem, system) is still operational

even though at a degraded mode.

Failure Diagnosis: Detecting, isolating and identifying a component (subsystem,

system) that has ceased to operate.

The overall concept consists of three essential tasks 29:

Fault Detection: detection of the occurrence of faults in the functional units of

the process, which lead to undesired or intolerable behavior of the whole system

Fault Isolation: localization (classification) of different faults

Fault Analysis or Identification: determination of the type, magnitude and cause

of the fault

26

Applications require different approaches depending on the nature of the faults

and the system. Also, a strong influence in the method chosen is based on the amount of

historical data available. If large amounts of fault data has been collected, automatic

clustering algorithms can be utilized with fuzzy logic or neural networks in order to

detect known faults in a system or component 30. Conversely, if little fault data is

available, model based approaches must be used in an attempt to create an accurate

physical representation of the system. Figure 5 shows the diagnostic and prognostic

framework 2. The following sections will present an in-depth, though not exhaustive,

summary of various diagnostic methods which have been applied in various health

frameworks.

Figure 5 - Diagnostic and Prognostic Flowchart.

2.6.1 Preprocessing and Feature Extraction

Diagnostic algorithms can be considered a subset of pattern recognition and machine

learning due to the objective of classifying the current state of a machine based on the

27

incoming sensor data. As such, the raw data provided will not always yield the greatest

classification percentage; instead data must be analyzed in different forms and

combinations in order to extract useful information for a given fault. Inconsistencies in

data, such as process and measurement noise, must also be considered during processing

in order to provide accurate and reliable results. When considering these anomalies,

careful attention must be given to ensure a proper balance between signal integrity and

information loss. Preprocessing techniques normally have tradeoffs and based on the

application the engineer must be able to distinguish the degree of noise reduction that

must occurs. In some instances a technique as simple as a low-pass filter will be

sufficient, but in other instances more advanced techniques such as Kalman filters,

wavelets, and artificial neural networks must be applied to the signal 31.

A classic problem with pattern recognition is a lack of information found in raw

data. Therefore, pertinent information from the sensor data must be found using feature

extraction. Many times, a very difficult problem can be reduced down to a few variables

by dimensionality reduction. These reductions in the size of the feature vector allows for

redundant data to be ignored while focusing the diagnostic algorithms on the information

that is relevant to the problem being solved. Several feature extraction techniques will be

discussed in the upcoming sections, but only as they pertain to the specific applications of

interest in this research. References 23233 provide additional insight regarding feature

extraction.

28

2.6.2 Techniques for Diagnostics

Fault diagnostics requires a careful choice of implementation based on the objectives of

the project as well as the data provided to the CBM designer. These implementation

choices have been the subject of numerous investigations in recent decades. Reference 2

lays out several major objectives that a CBM must envelop:

Ensure enhanced maintainability and safety while reducing operation and

support cost

Be designed as an open systems architecture

Closely control PHM weight

Meet reliability, availability, maintainability, and durability (RAM-D)

requirements

Meet monitoring, structural, cost, scalability, power, compatibility, and

environmental requirements

The technologies utilized to accomplish these objectives are split into two major

areas: model-based and data-driven. Model-based approaches involve the development

of an accurate physical model of the system under evaluation. Incoming sensor data is

then monitored and compared against the model in order to find residuals. The major

benefit of the model-based approach is its ability to detect unanticipated faults. For

mission-critical systems, the ability to detect such faults is an invaluable resource.

29

Conversely, the major drawback of model-based approaches is the complexity of modern

machine systems. If a system's dynamics are too complex, developing a model that is

accurate enough to find faults without false positives may prove too difficult or costly to

be a viable solution.

In the cases where determining a model is improbable due to the complexity of a

system, a data-driven approach is an alternate technology that has been proposed that

relies on parameter estimation from historical data to create a mathematical model of the

system. Machine learning and system identification techniques are very common

methods utilized in such circumstances. Machine learning techniques such as artificial

neural networks, support vector machines, and fuzzy-logic allow engineers to classify

faults based on sensor data without any knowledge of the underlying system. System

identification techniques such as regression, black-box and state-space models create a

mathematical representation of the system by estimating known physical parameters. It

is essential for a design engineer to understand that while the mathematical model can

accurately depict the output of a system, it contains no information of the physical

dynamics of a system as discussed in the model-based approach. In fact, both machine

learning and system identification rely on a large amount of historical data to create a

robust algorithm that allows for accurate classification of faults. This requirement is one

of the major drawbacks of these technologies because many times a CBM system is

30

designed in parallel with the hardware of a system and little to no operational data exists

34. Figure 6 shows flow charts for both techniques 2.

Figure 6 - Model-based and Data-driven diagnostic techniques.

As discussed in the previous section, there is often a lack of information in raw

sensor data. Therefore, a feature vector must be constructed that contains enough

information to determine the current operating mode. That vector information in a

model-based system will usually be the physical parameter that defines the system’s

dynamics. In a data-driven method, statistical regression and system identification will

most likely be used. Once the parameters that make up the feature vector have been

found, they can be compared with a library of fault vectors to determine the current fault

31

state of the machinery. Many times, a complex problem can be simplified to a few

parameters extracted from raw data. Once the fault has been found, advisory generation

can be created based on a database of corrective maintenance for each individual fault. In

the next section, once a fault has occurred, the remaining useful life (RUL) will attempt

to be found based on prognostic algorithms 229.

The following table shows recent research in the literature and describes several

diagnostic approaches along with their applications.

Table 4 - Diagnostic algorithms from the literature.

Authors and Paper Title Area of ResearchV.Puig, J. Quevedo, T. Escobet, F. Nejjari, and S. de las Heras, “Passive Robust Fault Detection of Dynamic Processes Using Interval Models,” 2008 35

Model-based fault detection based on interval models that generate adaptive thresholds using three schemes (simulation, prediction, and observation)

H. Bassily, R. Lund, and J. Wagner, “Fault Detection in Multivariate Signals With Applicatios to Gas Turbines” 2009 36

Compares multivariate autocovariance functions of two independently sampled signals in order to create a model-based algorithm to detect faults in a gas turbine

C. H. Lo, Eric H. K. Fung, and Y. K. Wong, “Intelligent Automatic Fault Detection for Actuator Failures in Aircraft”, 2009 37

Utilizes fuzzy-genetic algorithm to detect different types of actuator failures in a nonlinear F-16 aircraft model

G. Spitzlsperger, C. Schmidt, G. Ernst, H. Strasser, and M. Speil, “Fault Detection for a Via Etch Process Using Adaptive Multivariate Methods,” 2005 38

Uses an adaptive method to overcome false alarms in slowly degrading manufacturing processes that use Hotelling T2 and squared prediction errors.

W. R. A. Ibrahim, M. M. Morcos, “An Adaptive Fuzzy Self-Learning Technique for Predication of Abnormal Operation of Electrical Systems” 2006 39

Details an intelligent adaptive fuzzy system with self-learning functions that monitors electrical equipment

S. Huang and K. K. Tan, “Fault Detection and Diagnosis Based on Modeling and Estimation Methods, ” 2009 40

Uses multiple radial basis functions to estimate both the unknown nonlinear dynamics as well as the fault characteristics of a simulated system

J. Yun, K. Lee, K. Lee, S. B. Lee, J. Yoo, Proposes a stator-winding turn-fault detection

32

“Detection and Classification of Stator Turn Faults and High-Resistance Electrical Connections for Induction Machines”, 2009 41

algorithm using sensorless zero-sequence voltage or negative-sequence current measurements.

The authors of 35 demonstrated a technique using interval models to detect faults.

The paper compares and contrasts several different interval models. In particular, they

show the benefits and drawbacks of simulation, observation, and prediction interval

models. They applied their fault detection algorithm to the European Research Training

Network DAMADICS servo motor. They used several different parameter

configurations for each and use an optimization criterion to create an adaptive threshold.

When the input signal crosses the threshold, a fault indicator is set to a high state. It is

seen in the paper that the simulation method had the greatest accuracy because the model

does not depend on current inputs. The adaptive threshold in the prediction and

observation follows the input sensor values too closely and either has too many false

alarms or too many false negatives. Quantitative and qualitative analyses are given for

each method.

The autocovariance function for any zero mean stationary d-dimensional signal

can be used to determine if two independently sampled signals are statistically identical.

Reference 36 presents the theory behind such a claim, and goes on to provide insight on

how to use this property for diagnostics. The authors develop a statistical measure to

determine signal equality, and then applies the measure to multi-dimensional bivariate

33

white noise and compared with the empirical probabilities of several simulated models.

These are compared to ensure the feasibility of the statistical measure. To show

applicability to machinery, the method is applied to a gas turbine at Clemson University.

Several artificially induced faults are tested including: added synthetic noise, partial

blockage, and compressor relief valve failure. The final results were compared to

standard dynamic principal component analysis and it was found that the statistical

measure was able to detect the faults earlier and with more accuracy.

The authors of 37 developed an automatic fault detection system using genetic

algorithms and fuzzy logic. The fuzzy-genetic algorithm is proposed to eliminate the

need for hardware redundancy in aircrafts and instead suggests that analytic redundancy

is sufficient when a robust algorithm is applied to the dynamic behaviors of such a

system. The algorithm claims the capability to detect four types of failure mode

including no fault, elevator failure, aileron failure, and rudder failure. It detects these

failures by first fuzzifying the residuals and then evaluating them by an inference

mechanism using if-then rules. In order to optimize the rule table referenced by the

inference mechanism, a genetic search algorithm is used. The fuzzy rule table is coded

into a chromosome and the fault models are integer numbers. These chromosomes are

set to the size of the fuzzy rule table and decoded for the fuzzy evaluation system. The

fitness value of each chromosome is compared to the optimal objective function in order

to determine the optimal fuzzy rule table for each fault based on the residuals of the

34

sensor data. The algorithm is applied to a simulation study of the faults in a nonlinear F-

16 aircraft model. The system is compared to a linear classifier and a neural network.

The results show that the fuzzy-genetic algorithm performs well on all faults, and is very

resistant to measurement noise in the residuals. The proposed algorithm outperforms

both the linear classifier and the neural network in all cases.

In the semiconductor industry, Hotelling T2 and the squared prediction error are

gaining acceptance to monitor data provided by modern process tools. These methods

require models based on the covariance matrix of the training data set and problems arise

in the slow drift of modern manufacturing processes. Therefore, false alarms are created

during the estimation process which effectively negates the benefits of the diagnostic

algorithms. To counteract these drawbacks, an adaptive method for multivariate models

is developed 38. The authors take a current adaptive method of centering and scaling and

expand it to incorporate domain knowledge as well as remodeling using a moving

window approach. In each case the Hotelling T2 chart is tuned based on the current drift

of the system. The results were not seen as promising and it was found that engineering

knowledge was more important to the update of the individual univariate rather than the

automatic updating of the proposed methods.

The presence of fuzzy logic in diagnostic environments is gaining more

acceptance due to its flexibility and soft classification of faults. One of the drawbacks of

fuzzy logic is the required domain knowledge and historical data needed in order to

35

create an accurate and robust fuzzy rule table. Reference 39 attempts to overcome these

requirements is by creating a self-learning process that can predict failure modes in a

monitored system. The algorithm first determines the number of data points required to

find the underlying trend successfully. The next step is to determine how long of a

period is required to fully define a trend. Wavelet denoising is then applied to the signal

to create a clean signal for the fuzzy logic predictor. Two fuzzy techniques are

considered by the authors. The first uses a single fuzzy system to not only learn a trend,

but indicate whether or not the trend is part of a trend previously learned by the system.

The second technique uses a fuzzy system that learns the specific data trend, and a second

general fuzzy system that compares the incoming data to the trend produced by the first

fuzzy system. Both techniques perform well on a long and short-term simulation. They

were both able to select an adequate number of data points and period for detecting fault

trends in simulated data. The author compares the two techniques and describes the

applications for each.

Artificial neural networks (ANNs) have been accepted as a way to perform

function approximation without knowledge of the underlying system. The ability for an

ANN to balance its weights using optimization techniques and activation functions makes

them an ideal candidate for diagnostic algorithms. The authors of 40 propose an

algorithm that uses multiple radial basis function (RBF) neural networks to not only

detect faults, but also diagnose them. The first RBF is trained on nominal system data in

36

order to determine the mechanics of the system. If sufficient data is provided and the

optimal weight vector found, the RBF then becomes a state observer and residuals can be

calculated based on the incoming sensor data. These residuals are compared to an

existing threshold, and if found to be outside the bounds of error, then the observer

indicates that the system is in a failure mode state. After this step fault detection has been

performed and another RBF must be used to perform the fault classification and

diagnosis. The second RBF uses online tuning methods to diagnose the current failure

mode of the system. The RBF is initialized with its output weights set to zero in order to

force the initial state to be a “no failure” case. As the second RBF is trained using the

online data, its failure feature is compared with that of well-understood failures to

diagnose the system’s fault. The neural network was tested with simulation data of a

linear motor in order to prove its feasibility in real work. The author’s validation of their

results is considered future work when the algorithm is tested on a real robotic system.

One of the leading root causes of failure in an industrial plant is the open- and

short-circuit faults in the electrical circuit of the motor and electrical-distribution system.

These failures must be continually monitored to guarantee reliability. Reference 41

identifies a monitoring technique to find stator-winding turn faults using sensorless

methods. From simple current and voltage measurements, the faults can be detected by

identifying modes of zero-sequence voltage and negative-sequence current which can be

related to the turn-faults and high-resistance connections. The authors used the dynamic

37

model of the motor, which had been derived in references given by the paper.

Experiments were performed on a 4P 380-V 10-hp induction motor in order to

demonstrate feasibility of the algorithm. The stator-winding turn faults and high

resistance electrical connection faults were able to be detected and diagnosed using the

proposed method. The results promise added benefits and flexibility to maintenance

schedules in industrial plants.

2.7 Algorithm Development - Prognostics

The ability to predict faults and failures in a machine can yield great benefits for both

manufacturers and users. Prognostics is the field of study that attempts to find solutions

to the very difficult problem of predicting the future states of systems. There are many

more challenges in predicting failures then simply identifying the current state of the

machinery. In addition, once a failure has been detected or predicted, prognostic

algorithms must also find the propagation of the fault through the rest of the system.

Similar to diagnostics, the ability to predict the future state of a system is being widely

researched in fields other than engineering and health analysis. Financial researchers

have long attempted to forecast the stock market and provide investors with inside

knowledge into how the market fluctuates 42. Meteorologists use artificial intelligence

along with advanced radars and sensors to predict storm paths and the formation of

natural disasters such as tornados and hurricanes 43. Even with years of research,

38

advances in sensor technologies, and developments in the mathematical models of such

systems, prediction still is based on a probability where multiple scenarios must be taken

into account to ensure the highest reliability.

Prognostics can be broken into three categories: experience-based, evolutionary or

trending models, and physical model-based.

Experience-based is the most general of the three in that the algorithm will be

applicable to almost every machine system. This class of algorithms usually relies on

expert analysis of engineers who have worked extensively with the system. With expert

domain knowledge, a maintenance schedule can be developed and engineers can make

decisions with assistance from statistical measures and probability functions.

The evolutionary model requires enough historical data to develop an accurate

mathematical representation of the system. When the model is implemented into the

health analysis framework, the future values are predicted based on the operating

conditions and previous sensor data inputs. From these values, the model can then

predict future outputs and when faults will occur. Since these models are based strictly

on the input data, synthetic inputs can be built to simulate different operating conditions

that the system may encounter during its operational lifetime. The model is built from

historical data which makes it difficult for it to predict what will happen during abnormal

conditions. Therefore, any simulations run outside normal operating systems should be

taken as more of an uncertain advisory then an actual prediction of what will happen.

39

The more historical data that is available to create the mathematical model will normally

increase its accuracy and robustness when being deployed.

The final method, building physical models, is the most costly yet most accurate

approach for prognostics. Physical modeling requires a dynamic model that extracts

parameters from the system in order to predict the future state of the system. Once a

physical model has been made, different prediction technologies such as autoregressive

moving-average techniques or Kalman filters can be applied to predict future states of the

system. Physics-based models require the need for knowledge of both past and current

conditions in order to create a dynamic model that can be applied at any point during the

lifetime of a component. One benefit of the physical models is its ability to predict the

remaining useful life (RUL) of a component without any knowledge of faults that have

occurred in the system, though such information can increase the overall accuracy of the

prognostics system. Physical models require a thorough engineering knowledge of the

system to find quantitative measures of material properties and physical parameters

which represent the health of a system. These measures are then predicted based on the

current operating conditions and are accompanied by a probabilistic model which

provides an uncertainty factor 44. Figure 7 shows the three prognostic algorithms along

with their scope of work, cost, and accuracy 2.

40

Figure 7 - Approaches for prognosis.

The following table shows recent research in the literature and describes several

diagnostic approaches along with their applications.

Table 5 - Prognostic algorithms from the literature.

Authors and Paper Title Area of ResearchF. Peysson, M. Ouladsine, R. Outbib, J.B. Leger, O. Myx, C. Allemand, “A Generic Prognostic Methodoloy Using Damage Trajectory Models,” 2009 44

Presents a prognostic framework then decomposes a system into three levels: environment, mission, and process. Decision and data fusion between the three levels is used to create predictions.

41

Z. Sun, J. Wang, D. How, G. Jewell, “Analytical Prediction of the Short-Circuit Current in Fault-Tolerant Permanent-Magnet Machines,” 2008 45

Describes an analytical technique to predict short-circuit current in a fault-tolerant permanent-magnet machine under partial-turn short-circuit fault conditions.

Y. Zhang, G. W. Gantt, M. J. Rychlinski, R. M. Edwards, J. J. Correia, C. E. Wolf, “ Connected Vehicle Diagnostics and Prognostics, Concept, and Initial Practice,” 2009 46

Presents a complete end-to-end framework of diagnostics and prognostics of General Motors vehicles. Presents initial results of the implemented framework

M. Baybutt, C. Minnella, A. E. Ginart, P. W. Kalgren, M. J. Roemer, “Improving Digital System Diagnostics Through Prognostics and Health Management (PHM) Technology,” 2009 47

Integrates prognostics and diagnostics from engineering disciplines to provide minimally invasive onboard monitoring of digital systems.

P. Lall, M. N. Islam, M. K. Rahim, J. C. Suhling, “Prognostics and Health Management of Electronic Packaging,” 2006 48

Investigates methods to determine material state in complex systems and subsystems to determine RUL. Specifically, electronic packaging is targeted as a candidate for such methods.

S. K. Yang, “A Condition-Based Failure-Prediction and Processing-Scheme for Preventive Maintenance,” 2003 49

Uses an application-specific integrated circuit (ASIC) to perform preventive maintenance using Petri nets and Kalman filter prediction. The application of the ASIC is a thermal plant.

A. H. Al-Badi, S. M. Ghania, E. F. EL-Saadany, “Prediction of Metallic Conductor Voltage Owing to Electromagnetic Coupling Using Neuro Fuzzy Modeling,” 2009 50

Presents a Fuzzy algorithm that can predict the level of a metallic conductor voltage. Provides simulation results and validation for three scenarios.

The authors of 44 present an overview of the prognostic approaches described in

the previous section. They utilize these different technologies in the design of a

prognostics system for a ship. They extend the technologies by applying not only

operating conditions and sensor readings, but also the environmental conditions under

which the system is placed during its lifetime. It creates a formal method for modeling

a complex system based on the mission (operating condition), environment, and process.

42

The process is decomposed into the resources, where a resource is piece of equipment, or

a set of equipment. The mission is defined as the use of the system during a time period.

It analyzes the start and end dates of the mission as well as the set of places where that

task is performed. The environmental is the area where the system operates. It can be

characterized by a set of environmental variable that include air temperature, air

humidity, and wind force. The environment variables are then fuzzified which allows for

a definition for the impact an environment has on the system. A rule base can then be

defined in order to perform fault diagnosis and prognosis. The fusion of all three

elements, mission, environment, and process, provides a damage trajectory that predicts

the degradation of resources, subsystems, and overall system. A simulation was created

where a ship was traveling on a tour of Africa. Different missions and processes were

created to test the degradation of a ship during the travel. Initial results showed that the

framework was in fact a feasible method for the predictions of degradation of a complex

system.

Fault-tolerant permanent-magnet machines are showing promise in aerospace and

automotive sectors. Fault models have been developed for such machines, but in order to

predict failures only lengthy processes have been developed thus far. Reference 45

presents an analytical approach that quantifies various parameters of the machines.

These parameters are then used to identify worst-case short-circuit scenarios in the design

state and formulate remedial actions. The derivation of the short-circuit current is

43

provided as well as a validation by finite element analysis. Experimental validation was

also conducted by seeding various failure modes into the machines and seeing if analytic

model correctly identified the short-circuit current. The results showed promise for the

short-circuit current to be a viable method of feature-extraction for fault detection and

prediction. In particular, a Kalman filter was recommended to extract the fundamental

components of the feature and predict future faults.

The authors of 46 present a methodology for diagnostics and prognostics for

vehicles, specifically those manufactured by General Motors (GM). Three key

challenges are faced by vehicle manufacturers: unexpected new faults, infrequent and

intermittent faults, and prediction of system RUL. Many vehicle manufacturers develop

maintenance schedules for consumers, but many times parts are replaced before their

operational life is actually completed. To compensate for scheduled maintenance, a

concept called Connected Vehicle Diagnostics and Prognostics (CVDP) has been

developed where fault data is stored in onboard electronics and downloaded by the

manufacturing during its maintenance services. The fault data is then analyzed to

determine root causes for the intermittent faults of the vehicles. If-then rules are applied

to the data of a battery management system in order to detect any failure modes. The

conditions of the rules are currently specified by domain experts, but future work will

allow for adaptive thresholds to be computer when sufficient data is acquired. A

weighting of the parameters that caused the failure is then computed based on the number

44

of if-then rules violated. These weights allow engineers to determine the root cause of

intermittent failures which would previously not have been detected. Preventive

maintenance can then be performed when the parameters in other cars of the same model

are seen to be degrading. The system has been deployed in a GM manufacturing plant.

Digital systems are now present in everyday life for most consumers. Since

manufacturing techniques are not fault-proof, many times systems fail before their

lifetime. In mission critical situations, especially in military or manufacturing sectors,

these faults can produce catastrophic events. Therefore, the authors of 47 present a

technique for the detection and prediction of faults in digital electronic systems. The

focus of the paper is on the degradation of MOSFET devices and four particular failure

modes: thermal cycling, hot carrier effects, time-dependent dielectric breakdown

(TDDB), and electromigration. The system used to test the PHM methods is a MPC7447

and faults are seeded to accelerate the degradation of the processor. Aggregate power of

the processor is tracked as the main feature of degradation in the processor. Multiple

histograms are calculated over time and compared to analyze the feature and find

different failure modes for the processor. Based on the statistical feature vector, a

percentage of life consumed is calculated based on the amount of time the processor is

operating at a specific temperature. From this percentage, RUL is calculated from a life

consumption model and fault-to-failure progression data.

45

The authors of 48 present a novel method of prognosis based on the damage

caused by prior stress histories of electronic packaging. The paper states that the U.S. Air

Force throws away 1000 components to remove a single unknown one that is predicted to

be in a failed state based on a theoretical model. If analysis of the post stress conditions

of such components could be performed, the cost impact of prognostic methodologies

could be immense as wasted life is recovered without increasing risk. Components were

tested as simulated thermal cycles were applied. From this data, a mathematical

relationship was developed between phase growth and time to failure. Correlations were

found between the rate of change of the phase growth parameter and existing macro

indicators of damage. It is shown that RUL can be found based on phase growth rate and

interfacial shear stress of the chip.

State estimation is becoming a leading technology for the prognosis of complex

machine systems. Kalman filters have become a particularly appealing solution as it

contains an error parameter which provides a confidence interval of the prediction

through time. Reference 49 incorporates such methods with Petri nets to find and predict

failures in a thermal plant. The Petri nets are a graphical representation of relationships

between conditions and events and allows for the root causes of failures to be found and

preventive maintenance to be performed only on those components which are failing.

Kalman filters are then applied to the current state of the system in order to predict the

following state. N-Step state predictions can be performed as well, but the confidence of

46

each step decreases as the error in the covariance matrix of the filter increases. These

methods were implemented on an application specific integrated circuit and used in a

thermal power plant to validate the framework. Initial results of the proposed scheme

were seen to be very promising.

The authors of 50 discuss the ability for interference of circuit conductors to be

transferred from one to the other without any physically connected components. A fuzzy

model was conceived as a method to predict the interference caused by overhead

transmission lines. The feature vector was calculated using linear correlation analysis,

nonparametric correlation analysis, and partial correlation analysis. If-then rules were

applied using training data obtained during the project. Fault current, soil resistivity,

separation distance, and mitigation systems were the fuzzified four inputs and total

pipeline maximum voltage was the defuzzified output. The member functions used the

fuzzy model are found in the paper and the effect of interference based on nearby

metallic structures was analyzed. Excellent agreement between test and validation data

was obtained for three different scenarios.

2.8 Reliability Centered Maintenance

RCM is defined as an analytical process used to determine appropriate failure

management strategies to ensure safe and cost-effective operations of a physical asset in a

specific operating environment. It relies heavily on prior knowledge of the system and

47

subsystems under evaluation. It was developed after it was found that most systems were

being replaced before their active useful life. It compares the requirements of the

component from a user perspective and the design reliability of the component. When

employed, it is used in conjunction with the FMECA as references to the CBM and PHM

portions of the health analysis framework to guarantee that the following seven questions

are answered during a failure 251:

1. What is the item supposed to do and its associated performance standards?

2. In what ways can it fail to provide the required functions?

3. What are the events that cause each failure?

4. What happens when each failure occurs?

5. In what way does each failure matter?

6. What systematic task can be performed proactively to prevent, or to diminish to a

satisfactory degree, the consequences of the failure?

7. What must be done if a suitable preventive task cannot be found?

2.9 System Identification Techniques

System identification is the method in which mathematical models of dynamical systems

are built based on observed data of the system. These methods can save the cost and time

of having an engineer develop physical models of a system. The methods usually require

a large, well notated database of historical system data in order to build a robust

48

mathematical model. There are three entities involved in creating these mathematical

models 34:

A data set

A set of candidate models

A rule by which candidate models can be assessed

Figure 8 shows the general system identification loop.

49

Figure 8 - The system identification loop.

2.9.1 Autoregressive Models

The most basic of system identification techniques is a linear difference equation between

the input and output of a system. While there are continuous time models in system

identification, discrete time models are used most often in practice. These difference

equations are known as autoregressive models and is notated by:

y (t )+a1 y (t−1 )+…+an y ( t−n )=b1 u (t−1 )+…+bmu ( t−m ) (2.3)

This notation may be altered to solve for the next output value given the previous

observations:

y ( t )=−a1 y (t−1 )−…−an y (t−n )+b1 u (t−1 )+…+bmu ( t−m ) (2.4)

To account for measurement process noise, a zero-mean white noise distribution can be

estimated using another coefficient, which estimates error based on a moving average:

y (t )=−a1 y (t−1 )−…−an y (t−n )+b1 u (t−1 )+…+bmu ( t−m )+e (t )+c1 e ( t−1 )+…+cnc e (t−nc )(2.5)

To correctly model a system mathematically, the coefficients of 2.3 must be calculated.

There are various methods that can calculate the coefficients based on recorded inputs

and outputs over a time interval. Two of the most popular are the Levinson-Durbin

recursive algorithm and least squares method 34.

2.9.2 Kalman Filters

51

State-space models are developed to form a relationship between the input, noise, and

output signals using an auxiliary state vector. These models incorporate physical

mechanisms of the system. One type of state-space model is the Kalman filter, which

was developed in the 1960s 34. The discrete Kalman filter is defined in two steps, a time

update and a measurement update. The time update equations are defined by:

xk+1=Ak xk+Bk uk+Gk w k (2.6)

Pk−¿=Ak Pk−1 A T+Q¿ (2.7)

Where Ak and Bk are vectors of parameters that correspond to unknown values of

physical coefficients, material constants, etc, Gk is vector of parameters describing the

process noise in the system, xk+1 is the prediction of the state vector time, x (t) is the

internal state vector, w k is the process noise of the system, u(t ) is the control input to the

system, Pk is the a posteri estimate error covariance and Q is the process noise

covariance. The measurement equations are defined by:

K k=Pk−¿ H T ( H Pk H T+ R )−1

¿ (2.8)

xk= xk−¿+Kk ¿¿ (2.9)

Pk= ( I−K k H ) Pk−¿¿ (2.10)

The first step in the measurement update is to compute the Kalman gain, K k.

Then the process or sensor is actually measured and placed into zk. This is used to

generate a posteriori state estimate by incorporating the new measurement data. The final

52

step is to obtain an a posteriori error covariance estimate as in Eq. 2.10. The goal of the

Kalman filter is to minimize the posterior covariance error. The equations are recursive

which make them appealing for practical applications. In the field of prognosis, the

Kalman filter can perform multiple time updates without a measurement update to predict

health variables in the future 2.

53

CHAPTER 3: APPROACH

This thesis attempts to build a framework for an intelligent valve module for ISHM. This

framework is based on the health analysis framework discussed in the previous section.

This section will focus on the specific approach taken to fulfill the objectives of each

segment in the framework. Most of the work presented is for the general support of

valves in a mission critical situation, but some is specific applications to the NASA-SSC

test stand environment. The particular valve that will be analyzed is the large linear

actuator valve (LLAV) which is responsible for the distribution of cryogenic fluids to the

test stand and test articles. Figure 9 shows the regions of interest of the LLAV.

54

Figure 9 - LLAV with regions of interest labeled.

3.1 Failure Modes

Valves are a critical component for the day to day operations at NASA-SSC. The valves

must be precisely machined to meet the strict specifications set forth by the test stand

operators. These specifications raise the price of the valve, which can be tens of

thousands of dollars. Though manufacturing of the valves is meticulous, physical

degradation still occurs because of the strenuous environment where the valves operate.

In particular, the LLAV must transport cyrogenic and noncyrogenic fluids in high

pressures to test articles on the test stands at NASA-SSC. Therefore, a FMECA must be

55

Bonnet

BonnetPacking

Stem

Valve Plug Seating

Body

performed in order to classify and rank the important failure modes for the LLAV. The

analysis was performed in the early stages of the project in order to guarantee that the

algorithms developed could detect the failure modes of the valves. Since the valves have

already been developed and the sensors have been chosen, the goal of this FMECA will

to identify the critical faults and attempt to find solutions with the current capabilities at

NASA-SSC.

The LLAV FMECA was performed in conjunction with Scott Jensen, a NASA-

SSC test operations engineer and domain expert in the valves on the test stands. Scott

was able to provide valuable insight into the valve’s operational characteristics of the

valves in the E-Complex test stand. These characteristics include the role the LLAVs

fulfill, descriptions of the different components in a LLAV, the signs of degradation in

the LLAV, and the common failure modes that have been identified by the NASA-SSC

test operations engineers. The information was compiled and risk priority numbers were

calculated to prioritize the failure modes identified during the study. The algorithms

and framework was then able to be designed around the specific task of collecting data

that could identify and eventually predict these failure modes. Table 6 and Figure 10

shows the results of the FMECA:

Table 6 - Failure modes and effects for LLAV.

Function Failure Mode EffectsController for cryogenic fluid Seat Wear cause leaking fluid Fluid can enter system during

56

tank a test causing catastrophic failure

Monitor the feedback of the valve and downstream pressure

Faulty pressure sensor falsely indicate valve failure

Incorrect valve maintenance may be performed

Packing at the top of the valve prevents leaks and allows for balanced pressure

When frozen, the packing can crack and break apart, degrading the performance of the valve

Valve may not function properly or be able to maintain needed pressure for test

Actuator must transition from fully open to fully closed in a consistent amount of time.

If the valve does not open or close at consistent timings, valve maintenance must be performed

Emergency shutdown procedures may not be performed properly.

The controller of the valve sends a valve to full close.

If the PID controller is unstable or telling the valve to get to a value it cannot reach, the actuator may “bounce” on the seat causing degradation in the soft metal.

Seat wear (described above) can occur more quickly resulting in delays and increased maintenance costs.

The valve feedback must respond to the control signal in an appropriate time for effective test operations.

Excessive “deadtimes” create poor timing in test operations and can cause pressure or flow mixture errors.

If the mixture is not precise is certain test articles, undesired results can occur.

57

0 1 2 3 4 5 6 7 8 9 10 110

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

Criticality

Ris

k P

riorit

y N

umbe

r

Seat Wear

Frost Point

Sensor Failures

Extended DeadtimesTransition Times

Seat bouncing

Figure 10 - Prioritization of LLAV failure modes (see Equations 2.1 and 2.2 for y-axis calculation) .

3.2 Intelligent Valve Framework

Once the failure modes were found, the framework could be constructed based on the

requirements set forth by NASA. Figure 11 shows the system level flow chart of the

framework. Figure 12 shows the detailed health analysis framework for the intelligent

valve.

58

Figure 11 - System level flowchart of the Intelligent Valve framework.

Figure 12 - Health analysis framework for the Intelligent Valve.

59

3.2.1 Data Acquisition

The E-Complex operations center utilizes both the User Datagram Protocol

(UDP) and Dynamic Data Exchange (DDE) protocol to transmit data between the

networked computers in their test stands. Under the advice of NASA test operators, it

was decided that the best method of acquiring data into the plug-in was via the DDE

pipeline. The selection of DDE over UDP provided several benefits, as well as certain

drawbacks that must be accounted for in the development process. Some of the benefits

were:

The data could be acquired by simple strings rather than parsing the UDP packet’s

binary file.

The data would already be formatted into engineering units based on the

calibration sheets used in the UDP format.

WonderWare and Labview, both used for test operations, have built-in support for

Network DDE (NDDE).

Since the developers will not be at Stennis for the tests, application setup is easier

with DDE because of the prior knowledge the test engineers possess.

The framework can request just the specific data it requires for its algorithms

reducing its network footprint.

The drawbacks that must be overcome are:

60

The maximum DDE transfer rates are much lower than UDP.

The DDE data packet does not include an accurate time stamp for data annotation.

While WonderWare still includes DDE with its applications, the developers of the

protocol, Microsoft, have not updated or supported it for over a decade.

The most crucial drawback in the selection of the DDE protocol is the absent time

stamp in the data packet. Fortunately, NASA has a link to the Inter-range

instrumentation group (IRIG) system which provides highly accurate timestamps to the

networked computers in the test stands. In the software, this IRIG timer is used to

timestamp all the data as soon as it is acquired from the system. While there are still

some delays from the data acquisition software, this presents an accuracy that is usable to

compare data for algorithm development purposes. If the framework ever became a

“mission-critical” component the accuracy issue would have to be addressed more

strictly.

3.2.2 Preprocessing

To validate incoming data, threshold checks are performed at the acquisition of each data

point. The thermocouple data are subjected to the following test:

T min ≤T ≤T max

where Tmin and Tmax are the minimum and maximum temperatures of the thermocouple

type. The following table gives the type and temperature range for some commonly used

61

thermocouples:

Table 7 - Thermocouple types and ranges.

Thermocouple Type Minimum Temperature (oC) Maximum Temperature (oC)J 0 750K -200 1250E -200 900T -250 350

The valves are also subjected to the threshold test:

−5≤V ≤ 100

where V is the feedback or control signal of the valve. While it does not seem intuitive

that the valve state can be below zero, the operators at NASA use this method to

guarantee that a tight seal is being created between the actuator and the soft metal at the

bottom of the valve.

The data acquisition systems used in the E-Complex test stand perform

preprocessing techniques themselves in an attempt to deliver noiseless signals to the test

stand computers. Therefore, there is no need for advanced preprocessing techniques in

the intelligent valve module. Moreover, this allows the module to classify any noise

detected in the signal as an anomaly instead of process and measurement noise.

3.2.3 Failure Mode Detection and Diagnosis

The failure modes were investigated based on the FMECA with priority given to those

with a high RPN.

62

3.2.4 Valve Operational Statistics

Seat wear is one of the most severe and costly failure modes that can occur in the LLAV.

Not only is it expensive to replace the valve seat and insert, but it also forces excessive

delays in projects. It is very difficult to obtain a direct quantitative measurement of the

seal without the use of additional sensors. There are studies into detecting seat wear and

recession, but all use external instrumentation such as x-ray machines that are not

available for this research. Even though a direct measurement is not possible, combining

the valve's operational statistics and test operator's expert knowledge can provide

information and advisories for maintenance teams. After consulting with the test

operations team at NASA-SSC, seven statistics were selected for observation. They are

as follows:

Transitions - The amount of times the valve has traveled from a completely open

to a completely closed with non-cryogenic fluid flow.

Cryogenic Transitions - The amount of times the valve has traveled from a

completely open to a completely closed with cryogenic fluid flow.

Distance Traveled - The linear distance the valve has traveled in inches.

Last transition time - The time it took for a valve to go from completely open to

completely closed.

63

Average transition time - The average of the last ten transitions from completely

open to completely closed.

Direction changes - The amount of times the valve has changed motion from

either opening to closing or closing to opening.

Number of closings - The amount of times the valve has come to a completely

closed state.

These statistics can be used to measure how the valve is performing under certain

operating conditions. To detect the events, an algorithm, seen in Figure 13, was

developed based on the changing state of the valve and definitions presented previously.

64

Figure 13 - Valve statistics algorithm.

In the specific application of detecting seat recession, the statistics of relevance

are transitions, cryogenic transitions, and number of closings. When under cryogenic

conditions the metal packing hardens, reducing the amount of degradation on the seat.

65

Conversely, non-cryogenic closings create a deeper impact and reduces the operational

life of the seat. As stated in Table 6, seat bouncing can also adversely affect the seat if

not detected. The number of closings can be observed between tests and compared to the

amount of closings the controller relayed to the valve during the test. If there is a large

disparity between the two, it is an indication of bouncing either due to a valve fault or

controller instability. In either case, seat wear can be accelerated when there is a constant

changing of force on the seat.

3.2.5 Auto-associative Neural Networks for Sensor Validation

In order to provide a test article with the correct mixture of propellants, pressure and fluid

flow must be kept at very specific rates. This requires accurate sensors that can relay the

current readings back to test operations. The readings during failure modes of the sensors

can be unpredictable and can cause misclassified faults in a valve. For example, if a

downstream pressure sensor has a near zero reading after a valve is opened, it can appear

as though the valve did not open properly. When this happens, weeks or months of

unnecessary valve repairs may be performed instead of the day it takes to replace a

sensor.

There are two main approaches to this type of fault, physical and analytic

redundancy. Physical redundancy requires the use of multiple, similar sensors in the

same spatial location. Many times three sensors will be used and majority-rules

66

weighting system is used to determine the actual reading. Analytic redundancy exploits

functional relationships between components in the systems. The functions are normally

isolated into closely related subsystems to reduce their complexity. While physical

redundancy is a more robust solution than analytic redundancy, it is not feasible in all

situations. At the E-Complex test stands there is a limited amount of sensors that can be

attached to the data acquisition system for any given test. Also, running additional

connections through the complex test stand is very costly and safety protocols apply

stringent rules to where and how wires can be run.

Analytic redundancy can be applied to a system using either a complex model

comprised of physical properties and equations or a mathematical model that

approximates the functional relationship based on previous data. The physical model

results in a very detailed understanding of the system, and is applicable only for the

current system setup. Artificial neural networks (ANN) have been used extensively in

function approximation and pattern recognition. Specifically, auto-associative neural

networks (AANNs) have been used in sensor validation because of their ability to

perform nonlinear principal component analysis which allows for the extraction of key

features in a high dimensional, nonlinear dataset 5253. Reference 53 presents a training

method for sensor validation and AANN where two training runs are performed. The

first training run presents accurate training data to both the input and output in order to

learn the functional relationships between the two. The second training run presents

67

faulty data to the input, but accurate data to the output. This method allows the AANN to

become "insensitive" to faulty data and extract only the proper features from the dataset.

Figure 14 shows the two training methods.

Linear principal component analysis (PCA) can be beneficial in reducing high

dimensional datasets into their principal components. To accomplish this task,

eigenvalues of the covariance matrix are used to maximize the variance of the dataset in a

lower dimension, i.e.,

YP=T (3.3)

where Y is the sample set, T is the transformed data, and P is the eigenvectors of the

covariance matrix. Nonlinear PCA extends the capabilities of linear PCA by using

nonlinear functions instead of eigenvectors. In some cases, this can increase the variance

of the selected dataset and result in less information loss than linear PCA during the

dimensionality reduction. The following equations describe nonlinear PCA:

T i=Gi(Y ) (3.4)

where T i is the transformed data and Gi (Y ) is a vector nonlinear functions. In order to

restore data in nonlinear PCA, another nonlinear function is needed:

Y j' =H j(T ) (3.5)

68

where Y j is the restored data and H j (T ) is a vector nonlinear function. A difficulty in

nonlinear PCA is the determination of the nonlinear functions G and H . However, it has

been shown in previous work that functions of the following form are capable of fitting

any nonlinear function to arbitrary precision:

vk=∑j=1

N2

w jk σ (∑i=1

N 1

wij ui+θ j) (3.6)

Where v is the desired nonlinear function, w are weights of the sigmoid function, and

σ ( x ) is a function that approaches 1 as x approaches ∞ and 0 as xapproaches −∞. A

sigmoid satisfies this criterion:

σ ( x )= 11+e− x

(3.7)

Sigmoids are typically transfer functions seen in artificial neural networks. In

order to perform the dimensionality reduction, a bottleneck layer is used in the hidden

layer nodes of a multilayer perceptron. This allows for the common backpropagation

training technique to be used for sensor validation in the autoassociative neural network.

69

Figure 14 - Training method for auto-associative neural networks for sensor validation.

Another benefit of AANNs is their ability to predict the values of faulty sensors in

the output. If utilized in a mission critical situation, this can provide the information

needed to continue a test even when a fault is detected. Also, this data can provide other

70

fault diagnosis algorithms with accurate data that can narrow down the exact cause of

faults in a system.

3.2.6 Thermal Modeling

While cryogenic fluid can cause less wear on the seal at the bottom of the valve, there is

another packing at the top that allows the valve to offset pressures in order to operate

properly. If this packing freezes there is the potential that it will crack and cause pressure

equalization problems with the valve. This cracking in the packing is one of the reasons

that the steam of the valve is so long. Since the machining of the valves is so precise,

added inches in the stem can increase costs by tens of thousands of dollars. NASA-SSC

performed a series of tests under simulated conditions in the summer of August 2006 in

an attempt to establish a formula for valve frost points. From the tests, they discovered

that complex thermodynamic equations were unnecessary to estimate the frost line, but

instead a simple fin model gave accuracy up to 95%, which is sufficient for this

application. The equation estimates the base temperature of the body by tracking the

amount of time cryogenic fluid has been flowing through an open valve. This value can

be projected up the valve based on a thermal fin equation provided by NASA-SSC

engineers 54.

71

T tc= (T amb−T fluid )∗e−t open

m +T fluid(3.1)

where T amb is the ambient temperature, T fluid is the boiling temperature of the flowing

cryogen, t open is the amount of time the valve has been open, m is the amount of time

it takes for the valve to reach its steady state, and T tc is the estimated base

temperature of the body.

T est=cosh (mt∗( Lvalve−LTC ) )

cosh (mt∗Lvalve )∗(T tc−T amb )+T amb (3.2)

where Lvalve is the length of the stem of the valve, LTC is the distance of the thermocouple

from the base, mt is a material constant found experimentally for the valve, and T est is the

estimated temperature of the thermocouple located at LTC. This formula can be

manipulated in order to solve for the frost line of the valve by setting T est to 32oF and

solving for LTC, i.e.,

LTC=

−cosh−1( 32−T amb

T tc−T amb∗cosh ( mt∗Lv))

mt

(3.3)

This thermal model will be utilized in order to continually monitor the frost line

of the valve both during tests and when the test stand is idle. The monitoring of the frost

line provides two key benefits to NASA test operations. The first benefit is the ability to

monitor how many times and for how long the seal at the top of the valve has been

exposed to freezing temperatures. Knowledge of this statistic can assist the operator to

72

diagnosis any anomalies or faults found in the valve data. The second benefit is the

ability to monitor frost lines for future valve production. If a study can present

conclusive evidence that the valves being used in the test stand are much longer than

needed, tens of thousands of dollars can be saved when the existing valves needed to be

replaced.

3.2.7 Adaptive Thresholding

When preparing for a test, control algorithms are set to autonomously operate the valves.

The timings are very specific and the valve's behavior must remain consistent in order to

guarantee proper test firings. There are various faults that can prevent the valve from

operating correctly, but one of the most important details is how the valve reacts to the

control input, independent of the operating conditions. Therefore, simulations of the

valve's output based on the input can be run to estimate valve stroke timings and

behavior. The model used is a bank of autoregressive moving average (ARMA) filters

with an optimization constraint to specify an adaptive threshold. This was first proposed

in 35. Figure 15 shows the algorithm for the design and choice of ARMA models for the

adaptive thresholding with a description following.

73

Figure 15 - Adaptive threshold algorithm for designing and choosing ARMA models.

The adaptive threshold is chosen based on two optimization functions:

y (k )= minθϵ [θ , θ]

(Gu (q ,θ ) u (k ) ) (3.4)

y (k )= maxθϵ [θ , θ]

(Gu (q ,θ ) u (k ) ) (3.5)

where y (k ) and y (k ) are the minimum and maximum value of the simulated ARMA

models, respectively, Gu (q , θ ) is the transfer function of the ARMA model with

coefficients θ and order q, and u(k) is the control signal input at time k .

Fit=100∗(1−norm ( yh− y )

norm ( y−mean ( y ) ) ) (3.6)

74

where Fit is the percentage of the output variation that is explained by the model, yh is

the estimated output, and y is the measured output [17].

The historic data is assumed to be all nominal data in order to design a set of

models that can represent the entire set. During the training process, a fit equation is

calculated in order to guarantee that the models are not too accurate and not too lax.

Therefore, a threshold is set that the fit equation should be above 70%. This threshold

was found experimentally and may need to be refined based on the application. Once the

models have been selected, they are run through testing data that is from a similar dataset.

If any faults are found in this dataset, it can be concluded that there are not enough

models to completely describe the data properly. Models are continually created

changing the amount of coefficients in order to create a complete representation of the

dataset. Once a sufficient amount of models have been created, the control algorithm can

be run through the simulation and compared with the actual feedback from the valve

during the test. The adaptive threshold can mark faults during the test which can alert

test operations to anomalous behavior. The simulation of the control algorithm and the

feedback can be seen in Figure 16.

75

Figure 16 - Adaptive threshold algorithm simulation on real-time data.

3.3 Prognostic Survey

The ultimate goal of the intelligent valve framework is the ability to determine the

remaining useful life of the LLAV. At this time, however, the prognostics portion of the

framework is outside the scope of this research. Therefore, several prognostic techniques

will be investigated based on simple linear predictors as well as a state-space model. The

linear predictors will consist of the autoregressive and autoregressive moving-average

filter. The state-space model implemented will be the Kalman filter. These techniques

will be used in conjunction with a neural network to determine their feasibility for future

development in the Intelligent Valve Framework.

3.4 Diagnostic Process

76

Creating a software framework that can be expanded in the future requires careful

planning and structuring. Therefore, object oriented programming (OOP) techniques

were utilized to construct a backend acquisition and configuration protocol. A MS-SQL

database schema was design to store configuration information throughout tests in order

to create a persistent environment. The schema for the MS-SQL database can be seen in

Figure 17.

77

Figure 17 - Intelligent Valve database schema.

This database was designed in such a way that it meets the requirements of third

normal form (3NF). The normalization of databases enforces guidelines that efficiently

organizes data into a database. The database defines the necessary attributes required to

access data from the DDE servers at NASA-SSC. Valves contain several sensor

measurements that must be monitored for the diagnostic process to work correctly. These

78

values are stored in the ValveDetails table where the DDE tags and servers can be

specified as well as the length of the valve for the thermal models described previously.

In order to store the operating history of a valve, the ValveStatistics table contains a

column for all of the statistics described earlier. Each valve can contain several

thermocouples that are attached to its stem in order to validate the thermal model. The

Thermocouples table holds a foreign key to the ValveDetails table to correlate

thermocouples with their valves. This table also holds the current position of the

thermocouple on the stem of the valve and the high and low thresholds used to set the

flagged state of the thermocouples. The FluidDetails table holds the information used in

the thermal model of several fluids and their boiling point. The final table, DDE,

contains the connection strings for the DDE servers in order to access all the sensor data.

Each thermocouple, valve feedback, and valve control is required to have a foreign key to

one of the DDE servers.

The software framework was written in C# in order to simplify the development

process. Also, since the more computationally intensive algorithms are performed

offline, the speed benefits of C++ would have been minimal for this application. A class

structure was defined that allows several user controls to share the same data in an

efficient manner.

Class interfaces and structures have been defined to allow extensibility to the

Intelligent Valve framework. The first interface defines how a sensor receives values

79

from the data servers. It includes a single function with parameters for the name of the

item and the value captured by the data client. The reasoning for including the name of

the value is certain sensors, such as a valve, must keep track of multiple values like its

control and process variable. Currently, only a thermocouple and valve class have been

developed that implement this interface. The purpose of this interface, however, is too

allow other sensors, such as pressure and strain, to be included in a single collection in

the intelligent valve data handler.

The next interface defines the functionality a data server is required to have to be

included in the IV framework. The interface defines a number of function templates that

allow the data handler to either sample incoming data at a set rate, or subscribe to data.

Since most data servers require drastically different implementations, this framework

allows for the seamless integration of various data servers to be handled in a way that is

transparent to the IV data handler. The functions for the data server interface are listed in

Table 8.

Table 8 - Data server class interface.

Method Name Parameters DescriptionRequestDelegate None Allows the data handler to

subscribed to any new data that is sampled.

StartRequest String ItemName Commands the data server to begin sampling the item when commanded by the data handler.

StopRequest String ItemName Commands the data server to stop sampling the item.

PerformRequests Double elapsedTime Command the data server to sample

80

all request data. The elapsed time parameter is tracked by the IV data handler and represents the amount of time since the last time the server has been sampled.

StartAdvise String ItemName Commands the data server to begin sampling the item whenever a new value is available.

StopAdvise String ItemName Commands the data server to stop sampling the item whenever a new value is available.

Disconnect None Disconnect the client from the server and stop all sampling and advise loops.

Stop None Stop all sampling and advise loops, but do not disconnect from the server.

Resume None Resume all sampling and advise loops.

All data passed around the Intelligent Valve framework is a simple structure that

has three fields: String Item, String Value, and String TimeStamp. Each client of the

value is responsible for transforming the data into their own desired format. A static type

for the value parameter increases the predictability of the values the client will receive

and therefore reduces the amount of type and error checking needed to be performed by

future developers.

The data handler encapsulates the entire backend of the Intelligent Valve

framework. All controls in the framework receive a reference to this data handler and

can subscribe to updates of the different sensor temperatures as well as the update timer

when the data servers are commanded to sample. The data handler is also responsible for

81

logging the sensor data into a MS-SQL database for offline diagnostic tools. Figure 18

shows the entire class structure of the project.

Figure 18 - Software framework for the Intelligent Valve framework.

82

CHAPTER 4: RESULTS

Stennis Space Center test operators oversee the testing and validation of rockets for both

NASA and private companies. While few accidents have occurred at Stennis, it is still

important for test engineers to have a better understanding of the behavior of the valves

on the test stands. To further their comprehension, the diagnostic algorithms mentioned

above have been tested and validated against canonical data and simulated and injected

faults in test stand data.

4.1 Diagnostic Validation Data

Several datasets were used to validate the diagnostic algorithm and process discussed in

the previous section. The following sections will outline the procedures in which this

data was collected and how faults were injected into the data.

4.1.1 Thermal Model Data

In order to verify the thermocouple models, a test apparatus was constructed by the test

operations group. The setup was simple, but provided the ability to capture isolated

anomalies to see how the thermocouple reacts under different operating conditions. The

test was completed with the following protocol:

1. A simulated valve was programmed into the WonderWare simulation

environment.

83

2. When the simulated valve opened, liquid nitrogen (LN) was poured into

the box containing the valve.

3. During the next several hours, the liquid nitrogen was kept at a constant

level in order to simulate the passing of fluid through an open valve.

4. The temperature and frost line was monitored after the body reached a

steady state temperature of -322oF (boiling point of LN).

5. There was a thermocouple at the base of the valve and a thermocouple

about 20 inches up the stem of the valve, both were monitored and stored

in a data file.

During the test protocol, anomalies would be inserted periodically in order to

simulate and capture failure modes commonly seen at the test stands. Some anomalies

include the disconnecting of the top thermocouple, decrease in power supply voltage and

current, connection of resistor potentiometer to amplified input and output, thermocouple

debonding, and the effect of ice insulation.

As stated previously, the thermocouples used to measure the frost line

calculations have an error rate based on their type and measurement range. This

measurement error, as well as the error associated with the thermal model, provides a

threshold value that helps guarantee accurate data from the instrumentation. In order to

more accurately determine the experimentally calculated values, mt, an optimization

algorithm was utilized based on a curve fitting method and least squares constraints.

84

4.1.2 Sensor Validation Data

In March 2006, NASA initiated the Methane Thruster Testbed Project (MTTP) as a

platform for the research of plume diagnostics and ISHM. Historical data from live tests

was used to train and test the AANN for sensor validation. Hard and soft faults were

artificially injected into the test runs and simple thresholding was used to determine when

faults had occurred. These artificial faults were characterized during the thermal model

tests in order to create realistic faults in the data. The MTTP trailer can be seen in Figure

19.

Figure 19 - MTTP Trailer used for validating sensor faults.

85

4.1.3 Adaptive Threshold Data

In order to validate the adaptive threshold model, extensive failure data would be needed

that tracks a valve from nominal conditions to abnormal and eventually complete failure.

This data is difficult to acquire since valves normally are not left until they fail.

Therefore, a simulated control system was needed that provided a method to show

degradation in a valve’s response based on adaptable parameters. A common transfer

function used to simulate valves is seen in with a description of each parameter to follow.

V process=g∗e−T s∗s

s2+2∗ζ ¿T w∗s+T w2 (3.2)

where g is the gain, T s is the unit delay, T w is the natural frequency, ζ is the damping

ratio, and V process is the output of the transfer function modeling a valve's response to a

PID controller.

As the parameters are changed, the valve’s feedback should change accordingly,

and as the valve’s performance degrades, the algorithm’s adaptive threshold detects these

changes and labels faults in the system. In order to model NASA-SSC as closely as

possible, the control system uses a PID controller simulated in MATLAB’s Simulink.

Parameters for the PID were selected by common values used during live test firings at

NASA-SSC. The proportional constant was set at 1 and the integral component set to .1.

The parameters were modified based on the following intervals:

86

Table 9 - Adaptive threshold simulation parameters.

Parameter Nominal Low Abnormal High Abnormal

Gain .98 ≤ g ≤ 1.01 .8≤ g< .98 1.01<g ≤1.2

Natural Frequency .9≤T w ≤1.1 .8 ≤T w<.9 1.1<T w ≤ 1.2

Damping Ratio .9≤ ζ ≤ 1.1 .8≤ ζ <.9 1.1<ζ ≤ 1.2

Delay 2≤T s ≤3 0≤ T s<2 N / A

4.2 Thermal Model Validation

In order to validate the thermal model, experiments were performed with 10 faults

injected in a thermocouple which was bonded three inches up the stem of a fifteen inch

valve. The thermocouple data was compared to the thermal model and a simple

threshold of 22oF was used to determine when a fault had occurred. This threshold was

derived from the 95% accuracy of the thermal model. The overall range of temperatures

is from -322oF to 80oF or approximately 400oF and 5% of that is 22oF.

4.2.1 Thermal Modeling

The first test performed at NASA-SSC was a base run to identify the valve’s physical

parameters. The least squares optimization curve fitting method described in the

approach section was used to determine the parameters for the remaining tests. Table 10

shows the values that were found based on the optimization algorithm and shows the

simulation results using the parameters.

87

Table 10 - Physical parameter obtained from least square optimization curve fit of base run.

Mt – Chill Down Mt – Warm Up mt – Chill Down mt – Warm up

659.80 4672 .36 .32

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-350

-300

-250

-200

-150

-100

-50

0

50

Time (s)

Deg

ress

(F)

Base Run

Top ThermocoupleTop SimulationBottom ThermocoupleBottom Simulation

Figure 20 - Simulation data using thermal modeling for base run.

Once the physical parameters were determined, they could be used to validate the

thermal model’s ability to detect anomalies by injecting faults during similar test runs.

Disconnections were made at various locations during the test runs to see how the system

responded. Figure 21 shows the test setup and will the numbered locations will be

88

referenced within parenthesis, i.e. (13), throughout the following results. The bottom

simulation throughout the tests provides inaccurate results because of the dependency on

the ambient temperature. These tests were run for several hours and sometimes over

night with only a single ambient temperature being recorded. Therefore, the measured

bottom thermocouple is used for the top simulation except in the presence of a fault, then

the simulation was used.

Figure 21 - Data acquisition setup for thermal modeling fault detection.

The first fault simulates a faulty connection before the amplifier (13) and after the

patch panel (12). The faulty connection was simulated by connecting a potentiometer to

the referenced locations and increasing it quickly at 8230 and 8990 seconds. The fault

detection was able to detect both faults accurately using the thermal modeling in the top

thermocouple. However, there are some false positives reported in the chill down phase

of the test. While no fault was documented, abnormal behavior can be seen in the top

89

thermocouple as it rises slightly as the temperature reaches its minimum. In determining

the performance metrics, only documented faults were considered to be true positives

even if the measurements show unexpected results.

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

-350

-300

-250

-200

-150

-100

-50

0

50

Time (s)

Deg

ress

(F)

Faulty Connection in Amplifier Input


Figure 22 – Simulation data using thermal modeling for faulty connections in Tustin amplifier input.

90

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

nTop Thermocouple Fault Detection

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n

Bottom Thermocouple Fault Detection

Figure 23 - Fault classification using thermal modeling for faulty connections in Tustin amplifier input.

The next fault simulates a faulty amplifier (6, 13) as well as disconnects in the

Tustin patch panel (7, 14). The power downs of the amplifier were performed at 5563

and 5910 seconds with 6 input disconnections occurring at 7381, 7457, 7592, 7641, 9336,

9363 seconds. Again, simple thresholding combined with the thermal equations was able

to detect all faults accurately in both the top and bottom thermocouple. This test revealed

no false positives in the top thermocouple, which is the desired metric for these tests.

91

0.5 1 1.5 2 2.5

x 104

-500

-400

-300

-200

-100

0

100

200

Time (s)

Deg

ress

(F)

Amplifier Power Down and Tustin Input Disconnect


Figure 24 - Simulation data using thermal modeling for amplifier power downs and Tustin input disconnections.

92

0.5 1 1.5 2 2.5

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio


0.5 1 1.5 2 2.5

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


Figure 25 - Fault detection using thermal modeling for amplifier power down and Tustin input disconnection.

In order to simulate a fault in the digitizer input, a potentiometer was connected in

between (13) and (14). Instead of a hard fault, the resistance was slowly increased at

6693 seconds to simulate a drifting connection. A hard fault was injected at 7750

seconds by quickly increasing the resistance. While both faults were detected, several

false negatives were reported because the simulation was predicting values lower than the

measured value. Therefore, since the fault was slowly injected, there was a delay before

it reached the threshold values indicating a fault.

93

0 0.5 1 1.5 2 2.5

x 104

-300

-250

-200

-150

-100

-50

0

50

100

Time (s)

Deg

ress

(F)

Faulty Input Connection in Digitizer


Figure 26 - Simulation data using thermal modeling for faulty input connections in the digitizer.

94

0 0.5 1 1.5 2 2.5

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio


0 0.5 1 1.5 2 2.5

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


Figure 27 - Fault detection using thermal modeling for amplifier power down and Tustin input disconnection.

During a humid day, the moisture in the air can change into ice as it comes into

contact with the surface of the valve. When surrounding a thermocouple, the frost may

act as an insulator and cause incorrect readings. In order to simulate this, water was

applied to the valve stem as the test was occurring. The water then froze when that part

of the valve reached freezing point. In this test, there were no identifiable effects from

the frost insulation. However, the top simulation estimates a steeper drop in temperature

during chill down, which is recorded as a false positive. If the frost insulation occurred

on the bonnet of the valve

95

0.5 1 1.5 2 2.5 3

x 104

-300

-250

-200

-150

-100

-50

0

50

Time (s)

Frost Insulation #1


Figure 28 - Simulation data using thermal modeling for simulated frost insulation test 1.

96

0 0.5 1 1.5 2 2.5 3

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n

Top Thermocouple Fault Detection

0.5 1 1.5 2 2.5 3

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


Figure 29 - Fault detection using thermal modeling for frost insulation test 1.

In the next test, frost insulation was again added, but this time the thermocouple

was not in direct contact with the valve. This induced fault checks how frost can affect a

loose thermocouple. Based on the top thermocouple's data in Figure 30, it can be seen

that the top thermocouple lowered in temperature, but was well above the actual

temperature of the valve based on the top simulation data. The simulation threshold

method was again able to detect this fault with 100% accuracy.

97

0 1 2 3 4 5 6 7 8

x 104

-350

-300

-250

-200

-150

-100

-50

0

50

Time (s)

Deg

ress

(F)

Frost Insulation Test #2


Figure 30 - Simulation data using thermal modeling for simulated frost insulation test 2.

98

0 1 2 3 4 5 6 7 8

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


0 1 2 3 4 5 6 7 8

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


Figure 31 - Fault detection using thermal modeling for frost insulation test 2.

Figure 32 - Data acquisition modified setup for thermal modeling fault detection.

99

A junction reference error can cause misread thermocouple readings. In this

particular test, the junction was placed into ice water to simulate a reference error.

During the beginning of the test the top thermocouple does not reach the expected

temperature, but the more noticeable fault occurs when the junction was lifted out of the

water around 11832 seconds. A sharp decrease in the temperature resulted from this

induced fault. The fault detection algorithm was able to detect both faults with

reasonable accuracy.

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

-350

-300

-250

-200

-150

-100

-50

0

50

Time (s)

Deg

ress

(F)

Temperature Junction Reference Error


Figure 33 - Simulation data using thermal modeling for temperature junction reference errors.

100

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio


0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


Figure 34 - Fault detection using thermal modeling temperature for junction reference errors.

The next test simulated a series of disconnects and shorts in both the top and

bottom thermocouple. During warm up, the top thermocouple was repeated connected

and disconnected to simulate a connection that was just starting to become faulty. The

faults were able to be detected at a high precision, but several false positives and false

negatives were found during the repetitive disconnect due to the voltage not having

enough time to reach its minimum value.

101

0 0.5 1 1.5 2 2.5 3

x 104

-500

-400

-300

-200

-100

0

100

Time (s)

Deg

ress

(F)

Thermocouple and Power Disconnection


Figure 35 - Simulation data using thermal modeling for thermocouple and power disconnections.

102

0.5 1 1.5 2 2.5 3

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio


0 0.5 1 1.5 2 2.5 3

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


Figure 36 - Fault detection using thermal modeling for thermocouple and power disconnections.

This test was again simply a disconnect and shortage of the thermocouple,

however, the ambient temperature was recorded during the test which provided a more

accurate simulation model. Similar symptoms were seen as previous tests where a level

shift to the channel's minimum value was the result of a disconnect and a level shift to the

channel's highest value was seen for a short. Even with the ambient temperature,

however, several false positives can be seen during cool down. This again is probably

due to the body freezing much faster than expected due to the testing procedures.

103

0 0.5 1 1.5 2

x 104

-900

-800

-700

-600

-500

-400

-300

-200

-100

0

100

Time (s)

Deg

ress

(F)

Thermocouple Disconnection and Short


Figure 37 - Simulation data using thermal modeling for thermocouple disconnections and shorts.

104

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio


0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


Figure 38 - Fault detection using thermal modeling for thermocouple disconnections and shorts.

This test demonstrated a drifting fault that was simulated by decreasing the

voltage on the transmitters power supply over a two minute span. Since the fault's effect

was slower and our threshold value is so high, there was a number of false negatives

reported. This same test was performed several times over the course of an hour with

similar results. Near the end of the test the power supply for both transmitters was

dropped which cause a fault in both the bottom and top thermocouple. The fault

detection in the lower thermocouple allowed for the top thermocouple to retain a value

105

closer to the actual temperature of the valve which resulted in proper fault detection of

the transmitter's low power output.

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-500

-400

-300

-200

-100

0

Time (s)

Deg

ress

(F)

Transmitter Power Failures


Figure 39 - Simulation data using thermal modeling for transmitter power failures.

106

0.5 1 1.5 2 2.5 3 3.5

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio


0.5 1 1.5 2 2.5 3 3.5

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


Figure 40 - Fault detection using thermal modeling for transmitter power failures.

A new setup was used for this test, Figure 32 , where a thermocouple junction was

added (18,19) and placed in an ice bath. At warm up, it was removed from the ice bath

and a heat gun was blown on it. When the junction reference was in the ice bath, the

thermocouple's temperature was higher than expected, and when the heat gun caused the

data to be lower than expected. Both these induced faults were detected accurately.

107

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

-350

-300

-250

-200

-150

-100

-50

0

50

Time (s)

Deg

ress

(F)

Unaccounted Thermocouple Junction


Figure 41 - Simulation data using thermal modeling for unaccounted thermocouple junctions.

108

0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

0.2

0.4

0.6

0.8

1

Time (s)

Faul

t Cla

ssifi

catio

n


Figure 42 - Fault detection using thermal modeling for unaccounted thermocouple junctions.

Figure 43 shows a comparison of the thermal model's prediction of a frost line

against an actual thermocouple that was bonded to the stem of the valve three inches

from the body.

109

350 400 450 500 550 600 650

-3

0

3

5

7

9

11

13

X: 417.2Y: 3.006

Fros

t Lin

e (in

ches

)

Elapsed Time (s)

Frost line model comparison with actual thermocouple data

350 400 450 500 550 600 650

-20

-10

0

10

20

30

40

50

60

40

X: 359.8Y: 31.97

Tem

pera

ture

(F)

Thermocouple reading at 3 inchesActual time of frost point at 3 inchesPredicted time of frost point at 3 inches Predicted frost line

Figure 43 - Comparison of predicted and actual frost line.

It can be seen that the difference between the predicted and actual frost line at

three inches is approximately only one minute. Since the heat dissipation of the valves is

exponential, the steady state time of larger valves can be upwards of twenty-two hours.

Therefore, a minute is well within the accepted error for this application. These results

further validate the study performed in [14], but expands the work to detect faults in

thermocouples. The model being incorporated in the intelligent valve framework will

110

allow for the continuous monitoring of the frost line in the LLAV. If it can be found that

the frost line of the valve never reaches the packing at the top of the valve, the stem

length can be reduced saving thousands of dollars in the manufacturing of the valve.

4.2.2 Simulation Metrics

Table 11 - Performance metrics for faulty connection in amplifier input.

Positive NegativePositive 5529 1217Negative 1411 255460

Sensitivity 79.67%Specificity 99.53%Positive Predictive Value 81.96%Negative Predictive Value 99.45%F-Measure 89.89%

Table 12 - Performance metrics for amplifier power down and Tustin input disconnect.


Sensitivity 99.19%Specificity 99.99%Positive Predictive Value 96.61%Negative Predictive Value 100%F-Measure 98.27%

Table 13 - Performance metrics for input disconnection on the digitizer.


Sensitivity 95.94%Specificity 99.16%

111

Positive Predictive Value 68.66%Negative Predictive Value 99.92%F-Measure 81.14%

Table 14 - Performance metrics for frost insulation test 1.


Sensitivity 25.38%Specificity 100%Positive Predictive Value 100%Negative Predictive Value 96.42%F-Measure 100%

Table 15 - Performance metrics for frost insulation test 2.


Sensitivity 79.77%Specificity 100%Positive Predictive Value 100%Negative Predictive Value 97.69%F-Measure 100%Table 16 - Performance metrics for temperature junction reference error.



112

Table 17 - Performance metrics for thermocouple and power disconnection.


Sensitivity 96.22%Specificity 100%Positive Predictive Value 99.20%Negative Predictive Value 99.98%F-Measure 99.60%

Table 18 - Performance metrics for thermocouple disconnections and shorts.



Table 19 - Performance metrics for transmitter power and failure.



Table 20 - Performance metrics for unaccounted thermocouple junction.


113

Sensitivity 62.51%Specificity 100%Positive Predictive Value 100%Negative Predictive Value 51.34%F-Measure 100%

Table 21 – Average performance metrics for all thermocouple fault tests.


The metrics validate the feasibility for the use of thermal models for calculation

of frost line and thermocouple sensor validation. With a greater population size and more

controlled test configuration, the results can be validated further, but the initial test size

shows promising results for the use of this algorithm in the Intelligent Valve

framework. The faults that caused drastic changes in temperature during disconnects and

shorts in the thermocouple were always detected within a measurement sample of the

induced fault occurring. Other faults, such as a slowly degrading transmitter power

supply, caused a slow discrepancy in the thermocouple’s measured data and its

simulation temperature. This type of fault was able to be detected but it took several

minutes into the fault for the measured value to cross the threshold value.

The computational efficiency of this approach is also very appealing for use in a

mission critical situation where processing power is limited and must be reserved for

114

operational algorithms. Therefore, the calculation of the two thermal equations can be

performed on a sample-by-sample basis on numerous thermocouples during live test fires

giving real-time results.

4.3 Sensor Validation

NASA-SSC provided valve data with a downstream pressure sensor for validation of the

diagnostic algorithms. This data provided canonical datasets for the development of the

AANN sensor validation. Five total datasets were provided with three sets used for

training and two for testing. All the data provided was nominal, so artificial soft and hard

faults were injected into the data. A hard fault is defined as a level shift in the data where

the measurement values drastically change to a certain value and remains at that value for

an extended period of time. This is typical behavior of a sensor that is completely

disconnected. A soft fault is defined when the value of the sensor deviates from the

physical value slowly. This is characteristic of a sensor that slowly begins to degrade in

performance from either a slow bonding disconnect or insulation disconnect. A hard and

soft fault can be seen in Figure 44 and Figure 45, respectively.

115

Figure 44 - Example of a hard fault.

116

Figure 45 - Example of a soft fault.

An example dataset of the valve can be seen in Figure 46. This dataset has a very

simple correlation between the pressure sensor and the valve’s position. It is nearly a

step function between the valve and pressure reading. This testing, while simple, will

provide validation for the AANN method, which will be expanded to a more complex

system later. As previously mentioned, hard and soft faults were artificially injected

and an AANN was trained based on the method described in the background section.

Figure 47 and Figure 48 show the fault conditions and the AANN output.

117

Figure 46 - Example dataset from LLAV and downstream pressure sensor.

118

Figure 47 - Hard fault detection using AANN.

119

Figure 48 - Soft fault detection by AANN.

In order to further validate the AANN algorithm, the MTTP data discussed above

was also used to create a more extensive subsystem that could be tested. Again, artificial

faults were injected into different sensors at different times during the test, but were

characteristic of actual faults found during the thermal modeling tests. Figure 49 shows a

hard fault in a pressure sensor, Figure 50 demonstrates the AANN's ability to track a soft

fault in a separate pressure sensor, and Figure 51 shows the robustness of the AANN in

the case of large disturbances which is a known symptom of a faulty connection.

120

2.75 2.8 2.85 2.9 2.95

x 105

0

100

200

300

400

Elapsed Time (s)

Pre

ssur

e (P

SIG

)Simulated data for sensor validation

Measured DataSimulated Fault DataEstimated AANN Data

2.75 2.8 2.85 2.9 2.95

x 105

0

0.2

0.4

0.6

0.8

1

Elapsed Time (s)

Faul

t Det

ecte

d

Fault region detection

Figure 49 - Fault detection of a simulated hard fault in a pressure sensor.

121

2.75 2.8 2.85 2.9 2.95

x 105

0

100

200

300

400

Elapsed Time (s)

Pre

ssur

e (P

SIG

)

Simulated data for sensor validation


2.75 2.8 2.85 2.9 2.95

x 105

0

0.2

0.4

0.6

0.8

1

Elapsed Time (s)

Faul

t Det

ecte

d


Figure 50 - Fault detection of a soft fault in a pressure sensor.

122

2.82 2.84 2.86 2.88 2.9 2.92 2.94 2.96

x 105

100

150

200

250

300

350

400

Elapsed Time (s)

Pre

ssur

e (P

SIG

)

Simulated data for sensor validation


2.82 2.84 2.86 2.88 2.9 2.92 2.94 2.96

x 105

0

0.2

0.4

0.6

0.8

1

Elapsed Time (s)

Faul

t Det

ecte

d


Figure 51 - Detection of a simulated disconnect in a pressure transducer.

The AANN was able to detect the faults in the pressure sensor as well as predict

the values of the pressure sensor to a reasonable degree. In the hard and soft fault, Figure

49 and Figure 50, no false positives or negatives were detected by the AANN. In the

simulated disconnect, the fault data occasionally approached the AANN's value causing

false negatives to be detected. Depending on the application, this may be remedied by

123

setting an alarm only when a predefined number of fault classifications occurs, and

conversely disable the alarm when a defined number of positive classifications occurs.

To verify this algorithm further, the GOX subsystem of the MTTP was also

tested. Similar artificial faults were injected into the test data as well as multiple sensor

faults at concurrent times. The same metrics that were used in the thermocouple

algorithm were also calculated for the sensor validation with the addition of mean

squared error. Mean squared error was not used in the thermocouples due to the lack of a

"true" signal being present. The first test did not contain any faults to ensure that the

AANN had correctly learned the correlations in the system.

(a) (b)Figure 52 - Legend for AANN estimations: (a) Top estimation plots and (b) bottom error plots.

124

0 1 2

x 104

0

200

400

600

AANN estimation for PE-1134-GO

Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2

x 104

0

50

100

Error signal and threshold for PE-1134-GO

Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2

x 104

0

50

100

150

200


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2

x 104

0

10

20

30


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 53 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors under normal operating conditions.

125

0 1 2

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2

x 104

0

50

100

AANN estimation for PC1

Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2

x 104

0

5

10

15

20

Error signal and threshold for PC1

Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 54 - AANN Estimation for PE-1143-GO and PC1 pressure sensors under normal operating conditions.

126

0 1 2 3

x 104

0

20

40

AANN estimation for VPV-1139-FB

Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

5

10

Error signal and threshold for VPV-1139-FB

Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

20

40

AANN estimation for VPV-1139-CMD

Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

5

10

Error signal and threshold for VPV-1139-CMD

Elapsed Time (ms)

Per

cent

Ope

n (%

)

Figure 55 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors under normal operating conditions.

Table 22 – Performance metrics for fault detection using AANN under normal operating conditions.


Sensitivity 100%Specificity NaNPositive Predictive Value NaNNegative Predictive Value 100%F-Measure NaNAverage MSE 14.56

127

Sensor MSEPE-1134-GO 48.3133PE-1140-GO 0.8716PE-1143-GO 37.4732

PC1 0.1882VPV-1139-FB 0.5223

VPV-1139-CMD 0.0176

As can be seen in Figure 53-Figure 55, the AANN was able to find the correct

correlations based on the training data, then estimate the test data set while operating

under normal conditions. The VPV-1139-FB channel had significant noise is all of the

datasets which seemed to be caused by either a bad power supply or bad connection. In

order to create relatively nominal data, a moving average window was applied to the

training dataset as well as all the test datasets.

In the next test, a hard fault was injected into the PE-1134-GO pressure sensor

during the startup phase. The hard fault was a level shift to zero for the first 200 samples

in the sequence. Figure 56 - Figure 58 show the results of the six monitored sensors and

Table 23 shows the respective performance metrics.

128

0 1 2 3

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100

150

200


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

10

20

30


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 56 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with hard fault in PE-1143.

129

0 1 2 3

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

100

200

300

400


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

5

10

15

20


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 57 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with hard fault in PE-1143.

130

0 1 2 3

x 104

0

20

40


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

5

10


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

20

40


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

5

10


Elapsed Time (ms)

Per

cent

Ope

n (%

)

Figure 58 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD pressure sensors with hard fault in PE-1143.

Table 23 – Performance metrics for fault detection using AANN with injected hard fault in PE-1143-GO.


Sensitivity 100%Specificity 100%Positive Predictive Value 100%Negative Predictive Value 100%F-Measure 100%Average MSE 18.1059

131


PC1 1.0074VPV-1139-FB 0.7697

VPV-1139-CMD 0.0240

This test shows the robustness of the AANN with a hard fault in a pressure sensor.

The AANN was able to detect all of the faults in the pressure sensor as well as maintain

proper values for the rest of the sensors. Since the training data contains windows of

zeroed out sensors, it makes sense that this test would perform well. The MSE was

slightly higher in certain sensors, especially in the faulty sensor PE-1134-GO. However,

the values produced by the AANN were close enough to be used in lieu of the faulty data,

which is the goal of this algorithm.

It was seen in the thermocouple tests that a shorted sensor connection can result in

a level shift to the maximum value of the sensor. The next test simulates a similar short

in the PE-1143-GO sensor. The value was held for the entirety of the test to make sure

that the AANN could detect the fault through all transitions and not just the initial state.

Figure 59-Figure 61 show the result of the test and Table 24 shows the respective

performance metrics.

132

0 1 2 3

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100

150

200


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

20

40

60


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 59 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with level shift in PE-1143-GO.

133

0 1 2 3

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

100

200

300

400


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

5

10

15

20


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 60 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with level shift in PE-1143-GO.

134

0 1 2 3

x 104

0

20

40


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

5

10


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

20

40


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

5

10


Elapsed Time (ms)

Per

cent

Ope

n (%

)

Figure 61 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors with level shift in PE-1143-GO.

Table 24 – Performance metrics for fault detection using AANN with injected level shift fault in PE-1143-GO.


Sensitivity 100%Specificity 99.42%Positive Predictive Value 97.20%Negative Predictive Value 100%F-Measure 98.30%Average MSE 415.51

135


PC1 0.9VPV-1139-FB 1.80

VPV-1139-CMD 0.137The shortage simulations produced similar fault classification results as the hard

fault, but the prediction error of the sensor data was much higher. This is to be expected

as a number of runs in the training dataset contained similar values of PE-1143-GO as the

fault was reporting. Therefore, the correlations found by the neural network's bottleneck

layer would have been caught between two different states of the training data. Even

though the prediction accuracy decreased, the fault detection would still be sufficient to

pass on the data to a fault diagnosis algorithm which could identify the faulty pressure

sensor.

There are times on the test stand due to weather and wind conditions that a

sensor's insulation can become loose causing a faulty connection in the pressure sensor.

This disconnect can cause considerable noise in the channel's measurements. These

measurements can be particularly difficult to detect at an early stage because only small

variations in the measurement data can be seen. The first test seen in Figure 62 - Figure

64 is a simulation of a more drastic disconnect where the values of the PC1 pressure

sensor have a more severe discrepancy from the actual measured value.

136

0 1 2 3

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100

150

200


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

10

20

30


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 62 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with noise in PC1.

137

0 1 2 3

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

20

40

60


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 63 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with noise in PC1.

138

0 1 2 3

x 104

0

20

40


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

5

10


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

20

40


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

5

10


Elapsed Time (ms)

Per

cent

Ope

n (%

)

Figure 64 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors with noise in PC1.

Table 25 – Performance metrics for fault detection using AANN with injected noise in PC1.


Sensitivity 99.64%Specificity 86.37%Positive Predictive Value 96.65%Negative Predictive Value 99.26%F-Measure 98.22%Average MSE 89.43

Sensor MSEPE-1134-GO 52.02

139

PE-1140-GO 4.40PE-1143-GO 58.11

PC1 0.87VPV-1139-FB 1.62

VPV-1139-CMD 0.085

This test again verified the AANN's robustness even in the presence of noise. The

training method using random biases in the second training set optimized the weights to

enhance its understanding of the complex system.

The last test performed is the only "real world" fault that was available in the data.

As stated previously, the VPV-1139-FB sensor had noise is its channel during every test

run. The preprocessing of the data used a moving average to create normal operating

data that was sufficient for training the AANN. This test uses the original dataset to

validate the AANN's performance with actual fault data.

140

0 1 2 3

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100

150

200


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

20

40

60


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 65 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with noise in VPV-1139-FB.

141

0 1 2 3

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

5

10

15

20


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 66 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with noise in VPV-1139-FB.

142

0 1 2 3

x 104

0

20

40

60


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

10

20

30


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

20

40


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

5

10


Elapsed Time (ms)

Per

cent

Ope

n (%

)

Figure 67 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors with noise in VPV-1139-FB.

Table 26 – Performance metrics for fault detection using AANN with noise in VPV-1139-FB.


Sensitivity 99.93%Specificity 13.23%Positive Predictive Value 97.72%Negative Predictive Value 85.20%F-Measure 98.82%Average MSE 23.22

Sensor MSEPE-1134-GO 78.77

143

PE-1140-GO 11.24PE-1143-GO 45.17

PC1 0.98VPV-1139-FB 3.18

VPV-1139-CMD 0.03

While the AANN was still able to hold a low MSE in this case, the lack of a fault

region detection algorithm produces a very low specificity rating. This resulted in a low

specificity rating which could result in an undetected fault in the sensor. Detection of

spikes in data makes it difficult to determine and diagnose the source of a fault, and

therefore the sensitivity is usually defined based on the application. Fault region

detection algorithms can use fault windows with a majority rule decision to determine the

overall health of a sensor over a period of time to try and assist the fault diagnosis

algorithm.

The last test determines whether the AANN can detect simultaneous faults in

sensors. A disconnect was injected into PE-1143-GO and a short was injected into PC1

for the entirety of the test.

144

0 1 2 3

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

100

200

300


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100

150

200


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100

150

200


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 68 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with simultaneous faults in PE-1143-GO and PC1.

145

0 1 2 3

x 104

0

200

400

600


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

100

200

300


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

50

100


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

0 1 2 3

x 104

0

5

10

15

20


Elapsed Time (ms)

Pre

ssur

e (P

SIG

)

Figure 69 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with simultaneous faults in PE-1143-GO and PC1.

146

0 1 2 3

x 104

0

20

40

60


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

20

40


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

20

40


Elapsed Time (ms)

Per

cent

Ope

n (%

)

0 1 2 3

x 104

0

10

20

30

40


Elapsed Time (ms)

Per

cent

Ope

n (%

)

Figure 70 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors with simultaneous faults in PE-1143-GO and PC1.

Table 27 – Performance metrics for fault detection using AANN with simultaneous faults in PE-1143-GO and PC1.


Sensitivity 60.76%Specificity 19.80%Positive Predictive Value 20.15%Negative Predictive Value 60.24%F-Measure 30.26%Average MSE 14561

147

Sensor MSEPE-1134-GO 20464PE-1140-GO 10226PE-1143-GO 56154

PC1 361VPV-1139-FB 50

VPV-1139-CMD 110This verifies that the AANN is only useful in the presence of a single sensor fault.

The data from the output of the AANN would not be useful for the fault diagnosis

algorithm as there are too many misclassifications of faults in sensors that were still

operating properly. A possible solution is to replace the motor valves with a feedback

sensor to get a more accurate understanding of the operating state of the system. The

binary input from these valves were tested in the AANN to see if they could improve

performance, but showed no real benefit in similar tests. Therefore, to reduce

computational complexity and training time, they were removed from the tests.

Overall the use of a bottlenecked neural network with mapping and demapping

layers proved to be an effective tool for the detection of single sensor faults in a complex

system. The AANN was able to find correlations in the data without any knowledge of

the physical dynamics of the MTTP dataset. This provides a generic algorithm that will

work for most complex systems if enough training data is provided. The estimated data

provided by the AANN can also assist in the decision made by NASA test operators to

continue an expensive rocket engine test in the event that a fault is detected.

148

There are also several drawbacks to this method. First, large amounts of data

must exist that encompasses all of the system's operating conditions in order to train the

AANN properly. If insufficient data is provided, an online training algorithm may have

to be implemented to continually update the weights of the neural network. Also,

multiple sensor faults were not able to be detected with the current amount of data that

was provided to the system. Lastly, certain sensors may have no correlations to each

other, for example valve position and strain. Therefore, while pressure and valve sensors

work in this context, detection of a fault in a strain sensor would not and may even throw

off the detection of faults in the other sensors. It can be concluded then, that

determination of the AANN's sensor selection and training data must be performed

carefully in order to guarantee the successful detection of faults in the sensors that it is

monitoring.

4.4 Adaptive Threshold

To validate the adaptive threshold method, six different set point transitions were used to

determine the robustness of the algorithm with different transition ranges and speeds.

These transitions can be seen in Figure 71:

149

0 1000 2000 3000 40000

20

40

60

80

100

Samples

Ope

n P

erce

ntag

e (%

)

Setpoint transition 1

0 1000 2000 3000 4000

0

20

40

60

80

100

Samples

Ope

n P

erce

ntag

e (%

)


0 1000 2000 3000 4000

0

20

40

60

80

100

Samples

Ope

n P

erce

ntag

e (%

)


0 1000 2000 3000 40000

20

40

60

80

100

Samples

Ope

n P

erce

ntag

e (%

)


0 1000 2000 3000 40000

20

40

60

80

100

Samples

Ope

n P

erce

ntag

e (%

)


0 1000 2000 3000 40000

20

40

60

80

100

Samples

Ope

n P

erce

ntag

e (%

)


150

Figure 71 - Set point transitions for adaptive thresholding testing.

The parameters of the transfer function were modified as mentioned above in

order to validate the adaptive thresholding algorithm. Several example plots of the

algorithm working are presented below, with a description of the results following.

151

1500 2000 2500 3000 3500 4000

60

80

100

Sample

Per

cent

Ope

n (%

)

G: 0.98 Tw : 0.98 : 0.98 Ts: 2

Measured ValuesUpper ThresholdLower Threshold

1500 2000 2500 3000 3500 4000-1

0

1

Sample

Faul

t Det

ectio

n:

Fault identification: 0 faults

(a)

1500 2000 2500 3000 3500 4000 450060

80

100

Sample

Per

cent

Ope

n (%

)

G: 0.9 Tw : 0.85 : 0.85 Ts: 4


1500 2000 2500 3000 3500 4000-1

0

1

Sample

Faul

t Det

ectio

n:


152

(b)Figure 72 - Set point transition #1 with fault detection while operating in : (a) normal OS and (b) faulty OS.

In the first set point transition, the effects of a lower natural frequency and

damping ratio can be seen. The valve reacts normally as it ramps up to the setpoint, but

has difficulty reaching its steady state value. The algorithm was able to detect this using

the adaptive threshold until it reached a steady state point that was reasonably close to the

set point.

0 500 1000 1500 2000 2500 3000 3500 4000-50

0

50

100

Sample

Per

cent

Ope

n (%

)

G: 0.98 Tw : 0.98 : 0.98 Ts: 2


0 500 1000 1500 2000 2500 3000 3500 4000

-1

0

1

Sample

Faul

t Det

ectio

n:


(a)

153

0 500 1000 1500 2000 2500 3000 3500 4000-50

0

50

100

Sample

Per

cent

Ope

n (%

)

G: 0.9 Tw : 0.85 : 0.85 Ts: 4


0 500 1000 1500 2000 2500 3000 3500 4000-1

0

1

Sample

Faul

t Det

ectio

n:


(b)Figure 73 - Set point transition #2 with fault detection while operating in: (a) normal OS and (b) faulty OS.

The second set point transition in Figure 73 again shows the effects of a degrading

valve that cannot reach its set point quickly enough and when it does it overshoots the

value. This test shows a lower amount of faults, but the faults are localized around the

transitional points of the test. This information could be vital to a test engineer by

providing knowledge of not only how, but also what points in the test the valve is failing.

154

1500 1600 1700 1800 1900 2000 2100 2200 2300-20

0

20

40

60

Sample

Per

cent

Ope

n (%

)G: 0.98 Tw : 0.98 : 1.08 Ts: 2


1500 1600 1700 1800 1900 2000 2100 2200 2300

-1

-0.5

0

0.5

1

Sample

Faul

t Det

ectio

n:


(a)

155

1500 1600 1700 1800 1900 2000 2100 2200 2300

0

20

40

60

Sample

Per

cent

Ope

n (%

)

G: 0.9 Tw : 0.9 : 0.95 Ts: 5


1500 1600 1700 1800 1900 2000 2100 2200 2300-1

-0.5

0

0.5

1

Sample

Faul

t Det

ectio

n:



Although all valves have an input delay between the time it receives a signal and

it actually moves, this delay can increase as the health of a valve decreases and cause

undesirable behavior. In Figure 74 (a), it can be seen that the valve and threshold react in

the same reasonable time frame, however in Figure 74 (b), the valve reacts more slowly

and causes a fault to be detected in the valve.

156

100 200 300 400 500 600 700 800 900

0

50

100

Sample

Per

cent

Ope

n (%

)G: 0.98 Tw : 0.98 : 1.03 Ts: 3


100 200 300 400 500 600 700 800 900

-1

-0.5

0

0.5

1

Sample

Faul

t Det

ectio

n:


(a)

157

100 200 300 400 500 600 700 800 900

0

50

100

Sample

Per

cent

Ope

n (%

)G: 0.9 Tw : 1.15 : 0.85 Ts: 5


100 200 300 400 500 600 700 800 900

-1

-0.5

0

0.5

Sample

Faul

t Det

ectio

n:



Figure 75 shows how the algorithm would need to be tuned based a fit parameter

in order to get perfect accuracy. This set point time series was very fast with little steady

state time between the transitional periods. This is not common operating procedures in

the test stands at NASA-SSC, but is still useful to see how a valve will operate during an

emergency shutdown. Also, as the gain parameter lowers to .9, the valve is unable to

reach fully open. This can be caused by excessive wear or transition friction or a power

failure in the control systems.

158

0 500 1000 1500 2000 2500 3000 3500 4000-50

0

50

100

150

Sample

Per

cent

Ope

n (%

)

G: 0.98 Tw : 0.98 : 1.08 Ts: 2


0 500 1000 1500 2000 2500 3000 3500 4000

-1

-0.5

0

0.5

1

Sample

Faul

t Det

ectio

n:


(a)

159

0 500 1000 1500 2000 2500 3000 3500 4000-50

0

50

100

150

Sample

Per

cent

Ope

n (%

)

G: 1.1 Tw : 0.9 : 0.85 Ts: 4


0 500 1000 1500 2000 2500 3000 3500 4000

-1

-0.5

0

0.5

1

Sample

Faul

t Det

ectio

n:



In Figure 76, an issue with the algorithm can be seen as initialization effects can

cause false positives in the valve. However, this would not normally be a problem as the

algorithm would be continually running, but if the framework were to be turn on in the

middle of a test, this could cause some false positives in the valve's health analysis. Also,

in Figure 76 (b), the effects of a large gain parameter, coupled with a low damping ratio

can be seen as large oscillations occur at the top of the transitional period. These effects

are continually seen as the valve is suddenly closed and not given time to reach its steady

160

state value. This effect continues much longer than in the previous tests due to the

continually changing control variable.

0 500 1000 1500 2000 2500 3000 3500 4000-50

0

50

100

Sample

Per

cent

Ope

n (%

)

G: 0.98 Tw : 0.98 : 0.98 Ts: 2


0 500 1000 1500 2000 2500 3000 3500 4000

-1

-0.5

0

0.5

1

Sample

Faul

t Det

ectio

n:


(a)

161

0 500 1000 1500 2000 2500 3000 3500 4000-50

0

50

100

Sample

Per

cent

Ope

n (%

)

G: 0.9 Tw : 0.9 : 0.9 Ts: 4

0 500 1000 1500 2000 2500 3000 3500 4000

-1

-0.5

0

0.5

1

Sample

Faul

t Det

ectio

n:



Figure 77 - Set point transition #6 with fault detection while operating in: (a) normal OS and (b) faulty OS.(b)

Figure 77 shows a type of transition where the valve' degradation can be seen due

to the long and steady rise of the valve's control variable. Because most of the valve's

problems are exposed during fast transition times, the algorithm is unable to detect the

difference between a normally operating valve and a faulty operating valve. While the

algorithm is not detecting the failing health of the valve, it is also not providing false

positives where a valve would be fixed unnecessarily.

162

0.9 1 1.1 1.2 1.3100

150

200

250Gain

Avg

. Num

ber o

f Fau

lts

Parameter value0.8 1 1.2 1.40

200

400

600Natural Frequency

Avg

. Num

ber o

f Fau

lts

Parameter value

0.8 1 1.2 1.4120

140

160

180

200Damping Coefficient

Avg

. Num

ber o

f Fau

lts

Parameter value2 3 4 5

100

150

200

250Input Delay

Avg

. Num

ber o

f Fau

lts

Parameter value

Figure 78 – Average fault values for different parameters of the ARMA model thresholding method over all tests.

It can be seen in Figure 78 that as the parameters of the transfer function are

modified, the average number of faults increases as the valve's physical parameters

degrade. While there are faults in the nominal region, there is an increasing trend in all

the variables as the boundaries are exceeded.

To further validate the fault detection algorithm in the scope of the Intelligent

Valve framework, data was taken from the E-Complex test stand's simulation lab using

PLCs from NASA-SSC. A similar transfer function was used to model the valve's

163

response, however, the PID controller was controlled by a Allen-Bradley PLC which is

used in the E-complex test stand. By reducing the gain parameter, a simulated

obstruction or power failure can be injected into the feedback signal of the valve. The

same control signal was used for the test and training data to show how the valve changes

based on its parameters. Figure 79 shows the adaptive threshold on the training data and

Figure 80 shows the results of the simulated obstruction fault.

0 50 100 150-20

0

20

40

60

80

100

120

Time (s)

Per

cent

age

Clo

sed

(%)

Adaptive Threshold Training Data

Upper ValuesLower ValuesActual Values

Figure 79 - Training data with final threshold fit.

164

0 50 100 150

0

20

40

60

80

100

Elapsed Time (s)

Per

cent

age

Clo

sed

(%)

Valve Feedback with Simulated Obstruction

Upper ThresholdLower TresholdMeasured Valve Feedback

0 50 100 150-0.2

0

0.2

0.4

0.6

0.8

1

Upper Faults

Elapsed Time (s)

Faul

t Det

ecte

d

0 50 100 150-0.2

0

0.2

0.4

0.6

0.8

1

Lower Faults

Elapsed Time (s)

Faul

t Det

ecte

d

Figure 80 - Fault detection of simulated obstruction fault using adaptive thresholding.

The adaptive threshold was able to detect when the valve was unable to match the

control signal. Since all of the faults were detected by the lower threshold, the diagnosis

can be narrowed to such things as obstruction or power faults. If both lower and higher

faults were found, then the data would need to be analyzed further by domain experts to

determine the correct maintenance for the system. Also, there are several false positives

detected between t=111 s and t=115, however, the faults only exist for one time step,

which can be accounted for by the error in the ARMA models.

165

The adaptive threshold algorithm has been validated using a forward analytical

model to detect degradation in the LLAV as well as actual data from hardware and

simulations from NASA-SSC's E-complex test stand. Since no fault data from the LLAV

was available, the parameters of the transfer function were used to model how the valve

would react to its given input. The algorithm provides a fit parameter which can be used

to develop a range of values that are considered nominal to the test engineer, but maintain

the quality of performance required in such a critical environment. The algorithm

showed that it could detect faults amongst various transitions that would be commonly

seen in NASA-SSC test stands. There is a large difference in the number of faults

detected between the nominal parameters and fault parameters. If the faults detected by

the algorithm are trended between tests, the trend lines will show when a valve begins to

become faulty.

A drawback of this method is that it is data-driven and, therefore, requires

previous data from the valve to develop the ARMA models required to create the

adaptive threshold. The advantage of this method is that the data required to calculate the

coefficients is all normal functioning data rather than faulty data. Another drawback is

the lack of an optimization parameter for the fit equation. If one could be developed, the

amount of ARMA models needed could be optimized to reduce computational load of the

algorithm.

166

4.5 Valve Statistics

The valve operating statistics are used to advise in the fault diagnosis after a failure mode

has been detected by the previous algorithms. The operating statistics have been captured

for several test runs of two LLAVs from historical data in Table 28. These statistics are

presented to the operators in order to investigate negative trends in the system's behavior

and assists in determining more accurate maintenance decisions with the understanding

of the valve's operating history.

Table 28 - Operating Statistics for LLAV

Name Transitions Cryogenic Transitions

Distance Traveled

Transition Time

Average Transition Time

Direction Changes

Number of Closings

10A23 25 33 15 14.5 13.78 13 1210A24 17 35 14 13.2 15 20 25

4.6 Health Visualizations

The data is visualized using a 3D model to show the different operating conditions of the

LLAV. Utilizing drafted design documents, each of the valve components were modeled

in Autodesk 3D Studio Max. The valves were designed and animated to allow for an

exploded view or a cross sectional view during operation. When operating, the

visualization would display the direction of flow with a series of arrows. Green indicated

that the valve was open, while red indicated that it was closed. The frost point was

167

visualized through the use of a shader program that would display the frost height

through an icy bitmap texture, which would slowly replace the normal metal appearance

as the frost continued to migrate up the valve. Each of these visualizations can be seen in

Figure 81, Figure 82, and Figure 83.

Figure 81 - Frost line visualization of LLAV.

168

Figure 82 - Cross sectional and exploded view with flow and position visualizations.

169

Figure 83 - Frost line visualization of LLAV with thermocouple values.

4.7 Prognostics

While no prognostics were performed to determine the remaining useful life of the

LLAV, several techniques were investigated for future consideration of this task.

4.8 Prognostics Data

In order to test the feasibility of these techniques use in prognostics, the following data

was used to determine their performance under different environments.

4.8.1 Canonical Data

170

To perform simple validation of the AR, ARMA and Kalman filter, canonical time series

data. A linear equation using mean and variance was used to produce multiple test series.

Additive white gaussian noise was then added in order to see how well it could perform

under harsh environmental conditions. An example of this time series can be seen in

Figure 84 and Figure 85.

0 10 20 30 40 50 60 70 80 90 1000

20

40

60

80

100

120Time series with 0 mean and 1 variance

Time (s)

Am

plitu

de

Figure 84 - Linear equation with 0 mean and 1 variance.

171

0 10 20 30 40 50 60 70 80 90 100-20

0

20

40

60

80

100

120

Time (s)

Am

plitu

deTime series with 0 mean and 10 variance

Figure 85 - Linear time series with 0 mean and 10 variance

4.8.2 LLAV Data

To test the prognostic methods under an actual test, the data from the LLAV was used.

This presented a simple approach with an input and output that could determine how well

the techniques could predict into the future of a time series. This data was presented

earlier in the sensor validation section and Figure 46.

4.9 Prognostic Performance

172

The first time series was based on a 0 mean and 1 variance time signal. The models were

tested with the prediction steps ranging from 1 to 25 steps and the SNR of the AWGN

from -5 to 25dB. The MSE was measured and plotted to gauge performance. The results

can be seen in the following figures:

0 10 20 30 40 50 60 70 80 90 1000

20

40

60

80

100

120Time series with 0 mean and 1 variance

Time (s)

Am

plitu

de

Figure 86 - Original model time series.

173

0 20 40 60 80 1000

20

40

60

80

100

120

Time (s)

Am

plitu

de

AR prediction at 1 prediction step and SNR = 25dB

AR PredictionActual Signal

Figure 87 - AR prediction of first time signal at 1 prediction step and SNR = 25dB.

0 20 40 60 80 1000

20

40

60

80

100

120

Time (s)

Am

plitu

de

AR prediction at 5 prediction steps and SNR = 25dB


Figure 88 - AR prediction of first time signal at 5 prediction steps and SNR = 25dB.

174

0 20 40 60 80 1000

20

40

60

80

100

120

Time (s)

Am

plitu

de

AR prediction at 5 prediction steps and SNR = -5dB


Figure 89 - AR prediction of first time signal at 5 prediction step and SNR = -5dB.

175

05

1015

-50510152025

0

500

1000

1500

2000

2500

3000

3500

SNR (dB)

Prediction performance of AR model for = 0 and = 1 signal

Prediction Steps

MS

E

Figure 90 - AR MSE performance on 0 mean, 1 variance signal.

176

0 20 40 60 80 1000

20

40

60

80

100

120

Time (s)

Am

plitu

de

ARMA prediction at 1 prediction steps and SNR = 25dB

ARMA PredictionActual Signal

Figure 91 - ARMA prediction of first time signal at 1 prediction step and SNR = 25dB.

177

0 20 40 60 80 100-20

0

20

40

60

80

100

120

Time (s)

Am

plitu

de

ARMA prediction at 1 prediction steps and SNR = -5dB


Figure 92 - ARMA prediction of first time signal at 1 prediction step and SNR = -5dB.

178

0 20 40 60 80 1000

20

40

60

80

100

120

Time (s)

Am

plitu

de

ARMA prediction at 5 prediction steps and SNR = -5dB


Figure 93 - ARMA prediction of first time signal at 5 predictions steps and SNR = -5dB.

179

0

5

10

15

-100

1020

30

0

500

1000

1500

SNR (dB)

Prediction performance of ARMA model for = 0 and = 1 signal

Prediction Steps

MS

E

Figure 94 - ARMA MSE performance on 0 mean, 1 variance signal.

180

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

Time (s)

Am

plitu

de

Kalman prediction at 1 prediction steps and SNR = 25dB

Kalman PredictionActual Signal

Figure 95 - Kalman filter prediction of first time signal at 1 prediction step and SNR = 25dB.

181

0 20 40 60 80 1000

20

40

60

80

100

120

Time (s)

Am

plitu

de

Kalman prediction at 5 prediction steps and SNR = 25dB


Figure 96 -Kalman filter prediction of first time signal at 5 prediction steps and SNR = 25dB.

182

0 20 40 60 80 100-20

0

20

40

60

80

100

120

Time (s)

Am

plitu

de

Kalman prediction at 5 prediction steps and SNR = -5dB


Figure 97 - Kalman filter prediction of first time signal at 5 prediction steps and SNR = -5dB.

183

0

5

10

15

-50510152025

0

500

1000

1500

2000

SNR (dB)

Prediction performance of Kalman filter model for = 0 and = 1 signal

Prediction Steps

MS

E

Figure 98 – Kalman filter MSE performance on 0 mean, 1 variance signal.

As can be seen in the figures, as the prediction steps increase, the accuracy of the

model decreases. This is true in both models, which is to be expected as they both use

the same general approach to predicting future time series values. However, under

significant noise, the ARMA model performances much better. This is due to the

additional coefficients that calculate a moving average of the white noise. The signal was

changed by increasing the variance to 10 with the following results:

184

0 10 20 30 40 50 60 70 80 90 100-20

0

20

40

60

80

100

120

Time (s)

Am

plitu

deTime series with 0 mean and 10 variance

Figure 99 - Original time series model #2.

185

0

5

10

15

-50510152025

2000

3000

4000

5000

6000

7000

8000

9000

SNR (dB)

Prediction performance of AR model for = 0 and = 10 signal

Prediction Steps

MS

E

Figure 100 – AR MSE performance on 0 mean, 10 variance signal.

186

0

5

10

15

-50510152025

3000

3500

4000

4500

5000

5500

6000

SNR (dB)

Prediction performance of ARMA model for = 0 and = 10 signal

Prediction Steps

MS

E

Figure 101 - ARMA MSE performance on 0 mean, 10 variance signal.

187

0

5

10

15

-50510152025

0

500

1000

1500

2000

SNR (dB)

Prediction performance of Kalman filter for = 0 and = 10 signal

Prediction Steps

MS

E

Figure 102 - Kalman filter performance on 0 mean, 10 variance signal.

The results in this test are similar to that of the previous time step with some

changes in the performance of the ARMA model. The two linear regression models had

an error base much higher than the first test due to the high amount of variance in the

signal, however the Kalman filter stayed relatively constant through both tests. The

ARMA model performs the worst, which is due to its cancellation of noise through the

extra coefficient. The ARMA is actually smoothing the estimation too much because the

188

high variance was treated as noise. The Kalman filter performed more consistently in

this test than the previous one, and was still the best of the three.

The real benefit of prognostics can be seen when a process or output variable of a

system can be predicted based on the measureable input variables of the system. One

instance of this is a valve's control variable to predict its output variable. If the process

variable can be measured several time steps into the future, a disaster can be averted by

performing health analysis algorithms on the future state of the valve. In order to

perform this type of calculation, the techniques mentioned above must be extended to

account for an input variable. In the AR and ARMA models, an external input is added

with another vector of coefficient that must be calculated. The Kalman filter adds

another state vector to the time update equation to account for this type of prediction.

These three methods were applied to LLAV data provided by NASA-SSC. Similarly to

previous tests, AWGN was added to the signal to test the robustness of the techniques in

harsh conditions. The results can be seen in Figure 103.

189

85 90 95 100 105 110 115 120 125 130-20

0

20

40

60

80

100

120

Time (s)

Per

cent

age

Ope

n(%

)

30 step prediction of process variable using ARX model

Previous ProcessPredicted ProcessActual ProcessControl Input

Figure 103 - ARX prediction of the LLAV data to 30 time steps.

190

0

10

20

30

-50-40-30-20-10010

0

2000

4000

6000

8000

SNR

Prediction performance of ARX Model for LLAV signal

Prediction Steps

RM

SE

Figure 104 - Performance for ARX model based on LLAV data.

191

85 90 95 100 105 110 115 120 125 130-20

0

20

40

60

80

100

120

Time (s)

Per

cent

age

Ope

n(%

)

30 step prediction of process variable using ARMAX model


Figure 105 - ARMAX prediction of the LLAV data to 30 time steps.

192

0

10

20

30

-50-40-30-20-10010

0

2000

4000

6000

8000

SNR

Prediction performance of ARMAX Model for LLAV signal

Prediction Steps

RM

SE

Figure 106 - Performance for ARMAX model based on LLAV data.

193

85 90 95 100 105 110 115 120 125 130-20

0

20

40

60

80

100

120

Time (s)

Per

cent

age

Ope

n(%

)

30 step prediction of process variable using ARX model


Figure 107 – Kalman prediction of the LLAV data to 30 time steps.

194

0

10

20

30

-60-40

-200

20

0

1

2

3

x 104

SNR

Prediction performance of ARMAX Model for LLAV signal

Prediction Steps

RM

SE

Figure 108 - Performance for Kalman filter based on LLAV data.

All three algorithms were able to predict the output of the process variable

reasonably well, even out to 30 time steps. The results are similar to the canonical results

where the MSE of the techniques directly proportional to the SNR and prediction steps

used to simulate the signal. This is due to the presence of an input control variable that

allows the predictors to gain better context of how the valve will respond in future states.

The Kalman filter was the least consistent as the process and measurement noise are both

modeled by constant vectors with previous knowledge of the noise covariance. This

prognostic process, used in conjunction with the adaptive threshold method developed

195

above, could provide valuable seconds to the test engineers at NASA-SSC to make

determinations of test operations in the E-complex test stand.

The ARX model is the simplest of all the techniques and performs well under

systems with relatively low noise. It's simplicity makes it the lowest in both

computational and memory costs and can save valuable resources on mission critical

devices if large amount of valves are being monitored. The ARMAX model provides a

way to estimate the measurement and process noise through an additional coefficient that

calculates the error of the system as a moving average white noise. The ARMAX and

ARX model coefficients are both data driven in that they require historical data to

calculate their coefficients. In the test performed in this research, the determination of

the coefficients was done quickly and with low amounts of training data and the ARX

and ARMAX models were still able to perform well in the prognosis tests. A significant

drawback to these models is their inability to incorporate physical mechanisms into their

equations. They are both mathematical models with no relation to the realworld. The

Kalman filter, as well as other state-space models, provide the ability for real world

processes to be described by an internal state vector. This state vector is continually

updated throughout the prognosis process to minimize the state error covariance through

measurement and time update equations. The drawbacks to the Kalman filter is that the

parameters must be tuned which requires knowledge of the physics of the system. Also,

initial values are needed at the start of the algorithm to ensure optimal results. The

196

Kalman filter is the most computationally intensive, but least memory intensive as it only

relies on the current sensor data point which can be discarded after the measurement

updates have been performed.

4.10 Diagnostic Process

In order to make the framework practical for use by NASA-SSC engineers, the health

data must be displayed efficiently in the control computers. The software used by the

control engineers, WonderWare InTouch, allows for developers to expand the

functionality of their software through the use of Microsoft’s ActiveX modules and .NET

controls. The control computers each have four monitors that give the control engineers

vast screen real-estate to monitor the test stands during test article firings. Through the

software framework described in the approach section, a process was designed and

implemented with the design constraints in mind that provides the data necessary to

perform and visualize the health data of the LLAV.

The .NET module accomplished the tasks mentioned above by creating a tabbed

control that provides test operations with information required to make intelligent

maintenance decisions. A tabbed control was selected because it lowers the footprint the

module will have on control computer screens, while still allowing extensibility in the

future. The first tab contained the historical context of the valve by displaying crucial

operating statistics which are continuously monitored by the module. These values are

197

stored in a MS-SQLCE database in order to create a persistent record of the events of the

valve. The statistics tab can be seen in Figure 109.

Figure 109 - Intelligent Valve statistics tab.

The second tab, Figure 110, demonstrates the ability to track the frost line of the

valve. The method used to track the frost line of a valve will be discussed in the thermal

modeling portion of the Prototype Diagnostics section of this report. This tab allows a

test operator to quickly see all the thermocouples that are attached to a valve, as well as

their current health status. The flagged attribute of the thermocouple is determined by

either a percentage or absolute threshold designated by the test operator in the setup tab.

It also shows the current position, control, and open time of the valve. Each valve can be

selected from a drop down menu to see the current status of the valve. A 2D view is

provided so when the user clicks on a thermocouple in the list view, the position is shown

198

by a red box. This gives context to the position of the thermocouple in relation to the

total length of the valve.

Figure 110 - Intelligent Valve thermocouple tab.

The final functional tab allows the test operator to add, modify, reset, and delete

valves. It also provides functionality for adding, modifying, and deleting thermocouples

from the valve, and finally the ability to add and delete DDE data servers. This feature

enables test operators to change between setups, while still keeping persistent tracking of

199

the valve statistics. Also, the test operator can specify a data folder where the raw

measurement data is stored in another MS-SQLCE database. Future health analysis

algorithms can be developed, tested and validated on this data. Figure 111 shows the

setup tab.

Figure 111 - Intelligent Valve setup tab.

200

CHAPTER 5: CONCLUSIONS

ISHM capabilities can provide significant benefits for ground-based spacecraft

monitoring and control and ultimately can be adapted to provide on-board support for

spacecraft. Progressive development and demonstration of key ISHM architectural

elements requires that key propulsion components be adequately modeled and supported

with high-performance anomaly detection algorithms. It is also important that the

integration of the model within an ISHM framework be supported with useful user

interfaces that maximize the selectivity and utility of the ISHM output in order to obtain

the intended benefits.

5.1 Summary of Accomplishments

The objectives of this thesis are revisited below, and the solutions proposed to address

each of the problems indentified in this research work are summarized.

1. To design a framework for the detection of faults and failure modes in the large

linear actuated valve that are used on the rocket engine test stands at NASA-SSC.

An Intelligent Valve framework was designed using domain expert knowledge to

identify the key faults and failure modes in the LLAV. A FMECA was performed, as

seen in Section 3.1, to focus efforts on the most critical problems with the valves. Once

201

this knowledge had been acquired, a diagnostic process and algorithms could be

developed to detect these faults and failure modes.

2. To develop a diagnostic process that –

a. Receives and stores incoming sensor data;

b. Performs calculation of operating statistics;

c. Compares with existing analytical models; and,

d. Visualizes faults, failures, and operating conditions in a 3D GUI environment.

The diagnostic process was developed with an interface that could be easily

expanded in the future. The DDE protocol and a SQL database (Section 3.4) was used to

receive and store incoming sensor data in an efficient manner that could be easily

annotated by the diagnostic algorithms. In order to give maintenance personnel historic

context of the valve's operation, an algorithm was developed (Section 3.2.4) to capture

key operating statistics used throughout the valve's lifespan. A thermal analytic model

(Section 3.2.6) was developed by NASA engineers and implemented into the Intelligent

Valve framework. A 3D environment was developed using advanced visualization

techniques to show faults, failures, and operating statistics in a 3D environment, which

can be seen in Section 4.6.

3. To develop a suite of diagnostic algorithms that can detect anomalous behavior in

the valve and other system components of the rocket engine test stand.

202

A suite of diagnostic algorithms was developed that detects various anomalous

behaviors in the LLAV and other system components. The first is a sensor validation

algorithm using Auto-associative neural networks (Section 4.3), an adaptive thresholding

method to detect degradation in valve parameters (Section 4.4), and a thermocouple fault

detection using the thermal analytical model developed by NASA engineers (Section

4.2.1). These fault detection algorithms, coupled with the contextual information from

the operating statistics, can help advise maintenance personnel in their decisions to repair

the valves.

4. To expand the capability of the diagnostic algorithm to perform prognosis in

specific context.

The diagnostic algorithms have been expanded with prediction in specific context.

Particularly, AR, ARMA and Kalman filters were used to gauge the ability to predict the

process variable of a valve. These values can be used by the adaptive thresholding

method to determine faults in a valve seconds before they occur. If accurate enough,

these seconds could be the difference between an emergency shutdown, and a

catastrophe.

In this thesis, we have shown that a judicious combination of technologies,

namely, the DDE data transfer protocol, auto-associative neural networks, empirical and

physical models and virtual reality environments can be used to develop a diagnostic

procedure for assessing the integrity of rocket engine test stand components. We have

203

specifically focused on valves, because they are critical to the cryogen transport

mechanisms that are vital to test operations. This project is in the area of an identified

core competency at John C. Stennis Space Center; specifically in the technology focus

area of ISHM user interfaces. The project addressed the development of an effective

interface between the ISHM and its users in order to reduce information overload in the

typically crowded environments of complex system control rooms. We have designed,

developed and validated a user-interface that presents information related to the system

health and supports the user’s navigation through diagnostic scenarios with the ability to

extract and visualize the required system details.

5.2 Recommendations for Future Work

The state of the ISHM functional art is hampered by a number of factors; a major

constraint is the unavailability of intelligent process models that can provide the reasoned

determination of element condition based on the available data sources that feed the

ISHM architecture. One of the significant challenges is to develop realistic models for the

most common and problem-prone elements. Surprisingly, there are major gaps in our

understanding of how even fundamental elements (such as valves in a rocket engine test

stand) degrade and—more importantly—how to determine the remaining operational life

available from a valve or any other similar component. And, if an anomaly is detected,

what are the best means for providing a user with efficient tools to explore the nature of

204

the anomaly and its possible effects on the element as well as its relationship to overall

system state.

This thesis has addressed a part of the problem, by providing a framework for

diagnosing the integrity of a specific test-stand component – the large linear actuator

valve. The next steps in expanding this research work will involve the design,

development and validation of prognosis algorithms that can predict potential anomalies

in a reasonable time frame before they actually occur. This recognizes the fact that in a

test-stand environment, by the time a fault is diagnosed, it is usually too late to remedy

the problem. The subsequent addition of a prognosis module to the intelligent valve

model will provide test operations personnel to initiate “what if?” queries and enhance

the ability to perform a comprehensive risk analysis of every test procedure. The

combination of the analysis and prognosis algorithms can be used to arrive at a model

that can predict the remaining useful life of a test-stand component such a valve – making

such predictions provides a significant capability enhancement to ISHM platforms.

The research work presented in this thesis expands upon prior ISHM framework

that utilizes smart sensors by developing diagnostic tools that can track changing health

conditions in dynamic systems. This work has the potential to advance sensor data fusion

and integration to the degree required to achieve the benefits that are necessary to support

next-generation space exploration missions.

205

Referencesx

x[1] J. Schmalzel, F. Figueroa, J. Morris, R. Polikar, and S. Mandayam, "An architecture

for intelligent systems based on smart sensors," IEEE Transactions on

Instrumentation and Measurement, vol. 54, no. 4, pp. 1612-1616, August 2005.

[2] G. Vachtsevanos, F. L. Lewis, M. Roemer, A. Hess, and B. Wu, Intelligent Fault

Diagnosis and Prognosis for Engineering Systems, 1st ed. Hoboken, United States

of America: John Wiley & Sons, Inc., 2006.

[3] D. Schrage, D. DeLaurentis, and K. Taggart, "FCS Study: IPPD Concept

Development Process for Future Combat Systems," Georgia Institute of

Technology, Atlanta, Georgia, AIAA MDO Specialists Meeting September 2002.

[4] NASA, "NASA Reliability Centered Maintenance (Rcm) Guide for Facilities and

Collateral Equipment," NASA, Maintenance Guide 2008.

[5] M. B. Mengel, W. L. Holleman, and S. A. Fields, Eds., Fundamentals of clinical

practice, 2nd ed. New York, United States of America: Kluwer Academic/Plenum

Publishers, 2002.

[6] J. K. Shim and J. G. Siegel, Handbook of financial analysis, forecasting and

206

modeling, 2nd ed. Chicago, United States of America: CCH Incorporated, 2004.

[7] NASA History Division. (2010, January) NASA History. [Online].

http://history.nasa.gov/

[8] NASA Ames Research Center. (2005, March) NASA - Design Principles for Robust

ISHM. [Online]. http://www.nasa.gov/centers/ames/research/technology-

onepagers/design_principles.html

[9] F. Figueroa, R. Holland, J. Schmalzel, and D. Duncavage, "Integrated System

Health Management (ISHM): Systematic Capability," IEEE Sensors Application

Symposium, Houston, 2006, pp. 202-206.

[10] Pratt and Whitney Rocketdyne. (2010, January) J-2X. [Online].

http://www.pw.utc.com/Products/Pratt+&+Whitney+Rocketdyne/J-2X

[11] NASA. (2010, January) Propoulsion Testing at NASA's John C. Stennis Space

Center. [Online]. http://www.nasa.gov/centers/stennis/pdf/372105main_FS-2008-

10-00071-SSC.pdf

[12] M. Currie, "Where did all the People Go? The New Case for Condition

Monitoring," Chicago, 2006.

[13] M. Fargnoli, E. Rovida, and R. Troisi, "An example of a morphological matrix can

be seen ," The 4th International Conference on Axiomatic Design, Florence, 2006.

[14] Z. Fan and J. Ma, "An Approach to Multiple Attribute Decision Making Based on

207

Incomplete Information on Alternatives," Thirty-second Annual Hawaii

International Conference on System Sciences-Volume 6, vol. 6, Maui, 1999, p. 6041.

[15] T. Marchant et al., Evaluation and Decision Models - A Critical Perspective

(International Series in Operations Research and Management Science Volume 32).

Norwell, United States of America: Kluwer Academic Publishers, 2000.

[16] S. G. Arunajadai, Scott J. Uder, Robert B. Stone, and Irem Y. Tumer, "Failure Mode

Identification Through Clustering Analysis," Quality and Reliability Engineering

International, vol. 20, no. 5, pp. 511-526, April 2004.

[17] Society of Automotive Engineers, "Potential Failure Mode and Effects Analysis in

Design (Design FMEA), Potential Failure Mode and Effects Analysis in

Manufacturing and Assembly Processes (Process FMEA)," Automotive Quality

And Process Improvement Committee, Standard SAE J1739, 2009.

[18] FMEA-FMECA.com. (2009, August) FMEA / FMECA Information. [Online].

www.fmea-fmeca.com

[19] D. H. Stamatis, Failure mode and effect analysis: FMEA from theory to execution,

2nd ed., Pual O'Mara, Ed. Milwaukee, United States of America: William A. Tony,

2003.

208

[20] R. E. McDermott, J. Raymond Mikulak, and Michael R. Beauregard, The Basic of

FMEA, 2nd ed. New York, United States of America: Productivity Press, 2008.

[21] NASA Lewis Research Center, "Tools of Reliability Analysis: Introduction and

FMEAs," Cleveland, Presentation 2009.

[22] P. D. T. O'Connor, Practical Reliability Engineering, 4th ed. Hoboken, United

States of America: John Wiley & Sons Inc., 2002.

[23] C. Bunis [et al.], Design for Reliability, 1st ed., Dana Crowe and Alec Feinberg,

Eds. Lowell, United States of America: CRC, 2001.

[24] E. Crow, K. Reichard, J. Banks, and L. Weiss. (2005, February) Penn State Applied

Research Laboratory. [Online].

http://csrp.psu.edu/files/ishm2005/ishm_reichard.pdf

[25] A. Bayoumi et al. (2008, February) Condition-Based Maintenance at University of

South Carolina. [Online].

http://cbm.me.sc.edu/pubs/AHS1.pdf;http://cbm.me.sc.edu/pubs/AHS3.pdf

[26] A. Bandes, "What You Need to Know About Ultrasound CBM," Pumps & Systems,

pp. 60-61, December 2006.

[27] T. Wireman, Computerized Maintenance Management Systems, 2nd ed. New York,

United States of America: Industrial Press, 1994.

[28] University of South Carolina. (2009, February) College of Engineering and

209

Computing Condition-Based Maintenance. [Online].

http://cbm.me.sc.edu/pubs.html

[29] S. X. Ding, Model-based fault diagnosis techniques: design schemes, algorithms,

and tools. Berlin, Germany: Sprinter-Verlag, 2008.

[30] H. Park, W. Pedrycz, and S. Oh, "Granular Neural Networks and Their

Development Through Context-Based Clustering and Adjustable Dimensionality of

Receptive Fields," IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. 20, no.

10, pp. 1604-1616, October 2009.

[31] G. M. Davis, Ed., Noise Reduction in Speech Applications. Boca Raton, United

States of America: CRC, 2002.

[32] E. Micheli-Tzanakou, Ed., Supervised and Unsupervised Pattern Recognition:

Feature Extraction and Computational Intelligence. Boca Raton, United States of

America: CRC Press LLC, 2000.

[33] I. G. et al., Eds., Feature Exraction: Foundations and Applications. Berlin,

Germany: Springer-Verlag, 2006.

[34] L. Ljung, System Identification: Theory for the User, 2nd ed. Upper Saddle River,

United States of America: Prentice Hall PTR, 2007.

[35] V. Puig, J. Quevedo, T. Escobet, F. Nejjari, and S. de las Heras, "Passive Robust

Fault Detection of Dynamic Processes Using Interval Models," IEEE

210

TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, vol. 16, no. 5, pp.

1083-1089, September 2008.

[36] H. Bassily, R. Lund, and W. John, "Fault Detection in Multivariate Signals With

Applications to Gas Turbines," IEEE TRANSACTIONS ON SIGNAL PROCESSING,

vol. 57, no. 3, pp. 835-842, March 2009.

[37] C. H. Lo, Eric H. K. Fung, and Y. K. Wong, "Intelligent Automatic Fault Detection

for Actuator Failures in Aircraft," IEEE TRANSACTIONS ON INDUSTRIAL

INFORMATICS, vol. 5, no. 1, pp. 50-55, February 2009.

[38] G. Spitzlsperger, C. Schmidt, G. Ernst, H. Strasser, and M. Speil, "Fault Detection

for a Via Etch Process Using Adaptive Multivariate Methods," IEEE

TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, vol. 18, no. 4, pp.

528-533, November 2005.

[39] W. R. A. Ibrahim and M. M. Morcos, "An Adaptive Fuzzy Self-Learning Technique

for Predication of Abnormal Operation of Electrical Systems," IEEE Transactions

on Power Delivery, vol. 21, no. 4, pp. 1770-1777, October 2006.

[40] S. Huang and K. K. Tan, "Fault Detection and Diagnosis Based on Modeling and

Estimation Methosd," IEEE Transactions on Neural Networks, vol. 20, no. 5, pp.

872-881, May 2009.

211

[41] J. Yun, K. Lee, K. Lee, S. B. Lee, and J. Yoo, "Detection and Classification of

Stator Turn Faults and High-Resistance Electrical Connections for Induction

Machines," IEEE Transactions on Industry Applications, vol. 45, no. 2, pp. 666-

674, March/April 2009.

[42] Financial Forecast Center, LLC. (2009, November) Financial Forecast Center Home

Page. [Online]. http://www.forecasts.org/

[43] A. Rodgers and A. Streluk, Forecasting the Weather, 2nd ed. Chicago, United

States of America: Reed Elsevier Inc., 2007.

[44] F. P. et al., "A Generic Prognostic Methodology Using Damage Trajectory Models,"

IEEE Transactions on Reliability, vol. 58, no. 2, pp. 277-285, June 2009.

[45] Z. Sun, J. Wang, D. Howe, and G. Jewell, "Analytical Prediction of the Short-

Circuit Current in Fault-Tolerant Permanent-Magnet Machines," IEEE Transaction

on Industrial Electronics, vol. 55, no. 12, pp. 4210-4217, December 2008.

[46] Y. Zhang et al., "Connected Vehicle Diagnostics and Prognostics, Concept, and

Initial Practice," IEEE Transactions of Reliability, vol. 58, no. 2, pp. 286-294, June

2009.

[47] M. Baybutt, C. Minnella, A. E. Ginart, P. W. Kalgren, and M. J. Roemer,

"Improving Digital System Diagnostics Through Prognostic and Health

Management (PHM) Technology," IEEE Transactions on Intrumentation and

212

Measurement, vol. 58, no. 2, pp. 255-262, February 2009.

[48] P. Lall, M. N. Islam, M. K. Rhim, and J. C. Suhling, "Prognostics and Health

Management of Electronic Packaging," IEEE Transactions on Components and

Packaging Technologies, vol. 29, no. 3, pp. 666-677, September 2006.

[49] S. K. Yang, "A Condition-Based Failure-Prediction and Processing-Scheme for

Preventive Maintenance," IEEE Transactions on Reliability, vol. 52, no. 3, pp. 373-

383, September 2003.

[50] A. H. Al-Badi, S. M. Ghania, and E. F. EL-Saadany, "Prediction of Metallic

Conductor Voltage Owing to Electromagnetic Coupling Using Neuro Fuzzy

Modeling," IEEE Transaction on Power Delivery, vol. 24, no. 1, pp. 319-327,

January 2009.

[51] Society of Automotive Engineers, "Evaluation Criteria for Reliability-Centered

Maintenace (RCM) Processes," Standards Report SAE JA1011, 1998.

[52] M. Kramer, "Nonlinear Principal Component Analysis Using Autoassociative

Neural Networks," AIChE Journal, vol. 37, no. 2, pp. 233-243, February 1991.

[53] L. D. Mattern, C. L. Jaw, T. Guo, R. Graham, and W. McCoy, "Using Neural

Networks for Sensor Validation," 34th Joint Propulsion Conference, Cleveland,

1998.

213

[54] J. H. Lienhard IV and J. H. Lienhard V, A Heat Transfer Textbook, 3rd ed.

Cambridge, United States of America: Phlogiston Press, 2008.

[55] S. J. McPhee and M. Papadakis, Current Medical Diagnosis and Treatment 2009,

48th ed. New York, United States of America: McGraw-Hill Professional, 2009.

[56] D. Ruppert, Statistics and finance: an introduction, 1st ed., George Caseila, Stephen

Fienberg, and Ingram Olkin, Eds. New York, United States of America: Springer-

Verlag, 2004.

[57] R. Mimick, M. Thompson, and S. W. William, Business Diagnostics 2005: Evaluate

And Grow Your Business. Victoria, Canada: Trafford, 2005.

[58] J. Schmalzel and F. Figueroa, "Rocket Testing and Integrated System Health

Management," Condition Monitoring and Control for Intelligent Manufacturing, D.

T. Pham, Ed. London, England: Springer London, 2006, ch. 15, pp. 373-391.

x

x

214

users.rowan.eduusers.rowan.edu/~shreek/share/theses/russell/thesis... · web views. huang and k. k....

Documents