130604 reliability data analytics, adams
TRANSCRIPT
Reliability & Data Analytics-Viewpoints, Tips, Examples-
Tim C. AdamsNASA Kennedy Space Center
Career NotesTim Adams is a Senior Engineer in the Systems Engineering and Integration Division with the NASA John F.Kennedy Space Center’s (KSC) Engineering and Technology Directorate. Tim serves as a technical lead andresource in Reliability Engineering, the Technical Editor of the “KSC Reliability” web page, and the Manager ofKSC’s Integrated Design and Assurance System.
At NASA, Tim has 19 years of “hands-on” experience in Reliability Engineering and Technical Risk Analysisinvolving both flight systems and ground systems for the Space Shuttle, International Space Station, andConstellation Programs. Prior to KSC, he was the Lead of the Office of Safety, Reliability, and Quality Assurance’sAnalysis and Assessment Methodology Group at NASA Johnson Space Center.
Selected work products in Reliability Engineering and Technical Risk include reliability goals, trending problemhistory and measuring relative risk, predicting reliability (or availability) for a variety of components andsubsystems, and developing a Center-wide resource for “learning and doing” engineering-assurance analyses.
Selected special assignments and roles in Reliability Engineering and Technical Risk include Reliability andMaintainability Technical Discipline Team Lead for the Agency for the NASA Safety Center, Reliability EngineeringConsultant for the Centers for Disease Control and Prevention (CDC), and Team Lead and Principal Technical RiskEngineer for a multi-NASA Center effort that resolved the debate to disposition a degraded critical system.
Prior to NASA, Tim was a Product Manager/Market Analyst for an international manufacturer of oil tools and anApplication/Industrial Engineer for an international manufacturer computerized-numerical controlled (CNC)machine tools. In addition, Tim was a Director/Operations Manager for a municipality and utilities company thatincluded roles as an Emergency Response Officer and Project Manager for community’s first 911 public-emergency system.
Tim is a Certified Reliability Engineer (CRE) with continuous re-certifications since 1994 and a senior member withthe American Society of Quality (ASQ). His formal education is in Mathematics, Education, and Management.
2
Table of ContentsTopic Page
Presentation Objectives 4
The Management Process 5
Reliability is a System Operating Outcome 6
Reliability is an Enabling Function 7
Reliability Definition and Related Concepts 8
Reliability Efforts and Methods 9 & 10
Basic Math for Probabilistic Reliability 11
Thinking Analytically: A Process 12
Involving Management with Analytics 13
Example 1 – Making a Risk Analysis 14 - 16
Reacting to Reactive Situations 17
Example 2 – Providing Reliability to Operations 18 - 21
Example 3 – Integrated and Accessible Analytical Tools 22 - 28
Example 4 – Providing Reliability to Design Engineering 29 - 31
Example 5 – Sometimes Statistics will not do the Job 32 - 37
Uncertainty – Describing the “Goodness” of a Point Estimate 38
Summary 39
NASA, Tim C Adams 3
Objectives
In the area of quantitative Reliability Engineering
and Risk Assessment, share:
Viewpoints
Tips, Techniques, and Tools
Examples from various NASA programs
Show how the Reliability discipline relates to and
has synergy with other disciplines such as
Management, Systems Engineering, Safety, Quality,
and Risk.
Note: The live presentation of this content adds storytelling (case
method) to encourage discussion and embrace expertise from
participants.
NASA, Tim C Adams 4
Viewpoint The management process
NASA, Tim C Adams 5
Tip Trace your organization’s Return on Net Assets (RONA) to your Reliability effort. Is it
more than maintenance? Reliability excellence uses all portions of an organization.
~~ Reliability Management, An Overview, EQE International, 2000
Goals
Feedback
Core Values
Objectives
Project Plans orWork Processes
Actions
MeasuresEvaluations
Vision
Mission
Say what we do
Do what we say
Viewpoint Reliability is an operating outcome
A system operating outcome is a non-physical
characteristic (e.g., safe, dependable, low cost) and not a
physical characteristic (e.g., size, weight, flow rate).
Operating outcomes are inferred. An inference goes
beyond the known data.
Operating outcomes are the essence of Mission Assurance.
Though abstract (e.g., not observable on an engineering
drawing), the Reliability function needs to have and use its
own management process as well as be part of the
organization’s management process.
For more on mission assurance and operating outcomes,
see http://kscsma.ksc.nasa.gov/Reliability/Documents/Mission_Assurance.pdf
NASA, Tim C Adams 6
Viewpoint Reliability is an enabling function
Three types of jobs (NASA examples), namely
Doers (Astronaut, Flight Design Engineer, Subsystem Engineer, Flight
Controller, Vehicle and Ground Processing Specialist)
Enablers (Systems Engineer, Reliability Engineer, Safety Engineer, IT
Engineer, Contracting Officer, Attorney)
Managers (Project Manager, Director, Supervisor)
Reliability, Maintainability, Availability (RMA) Engineers are
enablers for Design Engineering and Production/Operations.
Why RMA (or just R&M) and not RAM? RMA is the sequence
used to design a system with this type of system operating
outcome. Thus, A is a function of R and M; denoted A = f(R,M).
RMA Engineers partner with systems engineers and safety
engineers.
NASA, Tim C Adams 7
Viewpoint Reliability and related concepts
Classic definition: Reliability is the probability (likelihood) an item will
perform its intended or required functions (mission) with no downtime
(e.g., maintenance and repair activities) during a given period of time
(mission time) under specified operating conditions (environment).
Emerging definition: Reliability is the ability of an item to perform…etc.
The focus is on achievement rather than probability. (Ref. Reliability Physics)
Reliability + Unreliability = R + U = Probsuccess + Probfailure = 1.
Thus, Likelihood in Risk = 1 – Reliability measure.
It is Availability and not Reliability that addresses and characterizes
quantitatively both uptime and downtime.
For more on Availability, see
http://kscsma.ksc.nasa.gov/Reliability/Availability.html
If Quality is the degree of fulfillment of customer expectations, then
Reliability is Quality as a function of time.
NASA, Tim C Adams 8
Viewpoint Reliability efforts and methods
In a lead role (strong enabler) the Reliability effort is called
Reliability Engineering; partners with Design, Systems, and
Safety Engineering; and uses both
Qualitative (Big Q) techniques
Quantitative (Little q) techniques.
In a non-lead role (weak enabler), Reliability is subordinate
to Maintenance and Safety Engineering and uses selected
Big Q and Little q techniques.
For more on Reliability techniques, seehttp://kscsma.ksc.nasa.gov/Reliability/Documents/Reliability_Discipline_Overvie
w_of_Methods.pdf
NASA, Tim C Adams 9
Viewpoint Crossroads in reliability math
Probabilism is the doctrine that probability is anadequate basis for belief and action, since certainty inknowledge cannot be attained.
Determinism is the doctrine that every event, act, anddecision is the inevitable consequence of antecedents(past events) that are independent of the human will.
Classical Reliability uses probabilistics; Reliability Physics(or Physics of Failure) uses deterministics.
The reliability measure from Reliability Physics is frommaterials, design, and environment and not from thestatistical treatment of items that failed or did not failduring test, operation, or both.
NASA, Tim C Adams 10
Viewpoint Basic math for probabilistic reliability
For occurrences with a constant rate: Use the Cumulative Binomial distribution
(demand or event based) or the Cumulative Poisson distribution (time based).
For non-repairable items that fail: Use the Weibull distribution for each failure
mode. This distribution models data sets (containing both failure and non-failure
data) having a decreasing, constant, or increasing failure (hazard) rate over time.
For repairable items that are down: Ideally, first, use the Laplace Test to measure
with statistical confidence the trend of failures over time. Second, if the Laplace
Test score is around zero, repairs can be assumed to be good as new. Thus,
reordering the data to fit a Weibull distribution is mathematically acceptable.
For more on the above, see the “Tools Sections” at the “KSC Reliability” web page.
Given stress (load) and strength (capacity) data: Use Stress-Strength Interference.
In this case, Reliability is probability the item survives the application of the load.
To quantify the dimensions (axes) of a risk scenario: In a risk matrix, a risk
scenario’s likelihood (probability) of occurrence axis can be one minus Reliability,
and the consequence (impact) axis can be (for example) a scaled Hazard Analysis.
NASA, Tim C Adams 11
Viewpoint “Think about your thinking”
My analytical process uses COP (or POC), i.e., C→O→P.
Concept: think aim purpose before
Operation: do fire action during
Product : produce confirm outcome after
Why COP? It helps me to police myself by asking core
questions, divide and conquer, and focus. When answers
to the COP questions are “yes” or “yes, but…” as opposed
to “yes” or “no,” options and possibilities remain available.
“Wisdom begins in wonder.”~~ Socrates
“People do not resist change; they resist being changed.”~~ Peter M. Senge
NASA, Tim C Adams 12
Tip Practice COP with the decision maker
Why, with analytical work, there are advantages when you
(are able to) communicate with the decision maker early in
the analytical work’s process.
Why, action-oriented managers do not like the lengthy
formal decision-making process. Use techniques to yield
immediate results; later use more sophisticated methods.~~ R.E.D. Woolsey & H. Swanson, Operations Research for Immediate Application: A Quick and Dirty Manual
Also, in regards to management support and user
involvement, “…only 40% of projects suggested by
quantitative analysts were ever implemented. But 70% of
the quantitative projects initiated by users, and fully 98% of
projects suggested by top managers, were implemented.”~~ Barry Render & Ralph M. Stair Jr., Qualitative Analysis for Management, 6th edition
NASA, Tim C Adams 13
Example 1 System (Asset) Level
For ISS Crew Health Care Maintenance System
“GOAL or SCOPE: Construct a [quantitative/
analytical] model of the various exercise devices
needed for crew health maintenance onboard ISS
[International Space Station] and use this model to
characterize the relative risk to the crew and ISS
mission for durations (mission times) of 200, 180,
and on down to 60 or less days. Provide a means of
assessing the risk of no on-orbit spare parts for our
equipment and its potential impact on the mission.”
NASA, Tim C Adams 14
Example 1 Making a Risk Analysis-Actual process
1. Obtain customer requirements (what, when, and “fictional or notional”product—“initial” P of COP).
2. Clarify jargon and “possible” strategy (how) for applying the Reliability discipline(C of COP for team members and customer).
3. Identify, sequence, and assign tasks to make a “draft” project plan (O of theCOP).
4. Obtain system knowledge (e.g., requirements, behavior, structure, andparametrics).
5. Obtain and organize uptime and downtime data for each subsystem. Provide atemplate and sample, if needed.
6. Provide a “quick” product. In this example, for each subsystem and at thesystem level, the dates for downtime events were converted to a bar chart andLaplace Test score. Identify poke outs (significant problem areas), if any.
7. Identify scenarios or “central questions.” A scenario asks what is the likelihood(probability) a particular configuration of the system (and an associatedconsequence) will occur. Thus, a scenario becomes the rule to assign portions ofthe uptime and downtime data to a particular system model. (Critical C and“final” P of COP-- not firm until pdf page 57 of the 83-page report)
NASA, Tim C Adams 15
Example 1 Making a Risk Analysis-Actual process (cont.)8. Build the model for crew health benefit (i.e., the complement of the consequence axis in a
risk matrix) by building Reliability Block Diagrams (RBDs). Each RBD contains subsystems
that satisfy physiological objectives for flight crews (astronauts aboard the International
Space Station). Parallel paths in the RBD mean more than one subsystem (with a relatively
“high” health benefit as determined by flight medical experts) can satisfy a physiological
objective. (First critical O of COP)
9. Build the model for likelihood of success (i.e., the complement of the probability of failure
axis in a risk matrix) by: (1) Assigning applicable uptime and downtime data to each scenario
by subsystem and (2) Determining the reliability and maintainability probability density
functions and parameters (math models) for each scenario by subsystem. (Second critical O
of COP)
10. Use (run) the mentioned models to calculate reliability, conditional reliability, and
instantaneous availability for each subsystem and for each physiological objective at various
mission times.
11. Build risk matrices at various mission times. The likelihood axis in success space is either
reliability (no repairs allowed) or availability (repairs allowed). The consequence axis in
success space is a weighted average of all physiological objectives.
12. Organize and release findings. Tip: Provide the source for each finding to aid in a self check
and with an external audit.
NASA, Tim C Adams 16
Tip Reacting to reactive situations*
“Don’t bring me a perfect answer after launch.”~~ A NASA Johnson Space Center Manager’s directive to his safety and mission assurance engineers
"Success comes in cans, not in cannots!“~~Joel H. Weldon, motivational speaker
“Plans are nothing; planning is everything.”~~ Dwight D. Eisenhower
“If you torture the data long enough, it will confess to you.”~~Ronald H. Coase, a British-born, America-based economist
~~Update: Mark Hulbert’s Sept 26, 2006 Market Watch stated, “If you torture the data long enough, you
can get it to say just about anything.”
“Somebody is going to have to suffer, either the reader or the
writer.”~~Tom Murawski, writing consultant
*Analyst self talk or mantras. A mantra is “A sacred Hindu formula believed to…possess magical power.”
NASA, Tim C Adams 17
Example 2 Program Level: Space Shuttle Program
Probabilistic RiskAssessment (PRA)
3rd - Reliability &Maintainability (R&M)
Assessments
1st - Trend Analyses
2nd - Quick Look Analyses
“Bottoms-Up Strategy”(The Triage Process)
“Top-Down Strategy”
Data
Notes:• Data is the driver, the starting point,
in the bottoms-up strategy.• The system is the driver in the top-
down strategy.• Both strategies are used to minimize
risk when only one of the twostrategies is used.
• Quick Look Analyses can use thedata pulled for the Trend Analysis.
• The next slide gives details on theTriage Process.
NASA, Tim C Adams 18
REQUIREMENT AREA
*Note: An element is one of the following:function, system, or line replaceable unit.
1 TREND ANALYSIS - an analysisthat identifies candidates for review and
possibly further analysis based onactivity, trend, and/or a risk measure
2 QUICK LOOK - an analysis thatprovides the density of problem attributesrelative to the element’s total and various
subtotals with high-level explanations
3 R&M ASSESSMENT - a detailedanalysis and assessment aimed at
determining problem root cause andrecommending corrective action
1.0 Element* Under Study Total system. All elements at various
levels are scanned. One or more elements (broken down
by part number and/or serial number) One or more elements (broken down
by part number and/or serial number)
2.0 Data
(“the raw materials”)
Detect date, failure criticality,operation purpose, and recurrencecontrol data from Problem ReportingAnd Corrective Action (PRACA).
All coded data and element ID datafrom PRACA. Typically, this data ispulled the same time data is pulled tomake a Trend Analysis.
Any data applicable to analyzing andassessing the reliability andmaintainability (R&M) of the currentdesign.
3.0 Data Sets/Scenarios
(“cutting the data”)
One-dimensional: each element is arow and uses 8 or 9 columns todescribe problem type and problemdisposition.
Two-dimensional: an analyzedelement uses columns from the TrendAnalysis and rows of 4 types toisolate valid problems on flighthardware.
Multi-dimensional: an analyzedelement uses a variety of data todescribe and evaluate conditions,element configuration, flight rules,performance, and management goals.
4.0 Quantitative Analysis
(“using Probability and Statistics”)
Count, trend, criticality, and risk Various types of graphical treatments
Double Pareto at the part group level Comparison to previous reports.
A “frequency table” is made for sub-elements and for each PRACA codedfield.
Standard or custom mademathematical models measure andevaluate the R&M of the elementunder study.
5.0 Quantitative Findings
(a.k.a. “little q”)
Decision rules based on problemcount, trend, criticality, and riskidentify “significant findings.”
A relativity large number ofoccurrences in a scenario (i.e., a cellor combination of cells) identifies anitem that needs to be explained.
“Fact-based” findings traceable todata and a quantitative analysismethodology. Use as metrics forrelated management goals.
6.0 Qualitative Analysis
(“using Science and Engineering”)
None High-level explanations from thesubsystem engineer especially oncells or combination of cells thatcontain large number of occurrences.
A multidisciplinary team uses selectedanalysis techniques to identify theunderlying reason or root cause forremovals, inefficiency, failures, etc.
7.0 Qualitative Findings
(a.k.a. “big Q”)
None High-density scenarios are noted asunderstood or not understood andunder control or not under control.
“Fact-based” findings traceable toaccepted principles and to aqualitative analysis methodology.
8.0 Conclusions (“using Logic”)None None Conclusions based on and traceable to
the findings in a logical manner.
9.0 Recommendations
(“proposed actions to make actual = plan”)
None None Recommendations based on andtraceable to the conclusions.
10.0 Benefit Analysis
(“testing the recommendations”)
None None An analysis that uses adjustedscenario(s) to measure the value ofproposed recommendation(s).
11.0 Report Format & Packaging Formal report containing
spreadsheets, graphs, methodology,and summary of significant findings.
Informal comments hand written ortyped below each frequency table.
Formal report on the above 10 areas, aspecial executive summary, andvisuals suitable for viewgraph use.
NASA, Tim C Adams 19
Example 2 The Triage Process in action at Level 3
The problem:
During launch, a system failed on the Space Shuttle Orbiter thatcaused major embarrassment as well as much expense. Should thissystem be replaced with new technology or upgraded?
If upgraded, identify the system elements (e.g., components) causingthe problem and the required reliability.
NASA, Tim C Adams 20
Example 2 Problem details and resolution
A Fuel Cell on a Space Shuttle Orbiter caused
a minimum duration flight (MDF) during STS-
83.
In addition to the MDF, a previous launch delay and
numerous maintenance actions during “vehicle
turnaround” made this system a serious candidate
for improvement.
A detailed reliability and maintenance (R&M)
analysis and assessment report on all fuel cell line
replaceable units (LRUs) from the STS-26 to STS-85
time period was completed.
This R&M assessment was instrumental in the
decision to change regulator material in all LRU’s
for $12M instead of replacing with a new design
estimated at $50M.
NASA, Tim C Adams 21
Example 3 Center Level: An Enabling System
Around year 2006, NASA Kennedy Space Center’s (KSC)
Office of Chief Engineer led the effort called Integrated
Design and Assurance System (IDAS).
IDAS’s goal that remains today is to provide and support a set
of integrated COTS modules for any KSC employee at
anytime (24/7) to “learn and do” engineering assurance
analyses (e.g., safety, reliability, maintainability) over the life
cycle of a system. In addition, IDAS had the purpose to
demonstrate data file transfer with other tools.
The next six slides provide an overview of IDAS and
viewpoints on classifying and deploying various analytical
methods.
NASA, Tim C Adams 22
KSC’s Integrated Design andAssurance System (IDAS)
Tim Adams
NASA Kennedy Space Center
Engineering & Technology Directorate
Systems Engineering & Integration, NE-D1
February 2012
23
KSC’s Integrated Design and Assurance System (IDAS)External Website for IDAS: http://kscsma.ksc.nasa.gov/Reliability/IDAS.html
• Origin and Purpose: About six years ago, KSC’s Office of Chief Engineer led the effortcalled IDAS. IDAS’s goal that remains today is to provide and support a set of integratedCOTS modules for any KSC employee at anytime (24/7) to “learn and do” engineeringassurance analyses (e.g., reliability) over the life cycle of a system. In addition, IDAS hadthe purpose to demonstrate data file transfer with other COTS tools.
• Selected Features: IDAS allows an equipment list (bill of materials) to be imported orinputted and then used to populate various analysis modules. Some modules can belinked.
• IDAS is located on the KSC network and is totally electronic (e.g., provides access viaNAMS, provides online training as well as by other methods, allows a user or group ofusers to build, run, and view analyses, and uploads to KSC’s Product Data Managementsystem).
• Use: IDAS is used by all KSC programs. For example with the Constellation Program, KSCGround Systems’ Systems Engineers in conjunction with KSC Design Engineers used IDASto identify and make over 100 design changes prior to build. This work is described in anAIAA paper that was selected as one of the top 30 papers.
• Summary: In today’s Model-Centric Engineering terms, IDAS is an integrated and highlysupported suite of “model-based engineering” modules that perform various types oftechnical risk analyses, one important part of systems engineering, and is a step in thedirection of closed-loop-model-based-systems engineering (MBSE).
NASA, Tim C Adams 24
IDAS Modules by Vendor & File Transfer Capability
NASA, Tim C Adams
Probabilistic RiskAssessment
(PRA)
Operations Analysis Text Mining
Design, Systems, Reliability, and Safety Engineers
Systems, Reliability,Maintainability, Supportability
(Logistics), Cost, and IndustrialEngineers
ReliabilityEngineers
andStatisticians
Quality &SustainingEngineers
ManagementCost
Analyst-Engs.
ReliabilityEngineers
andStatisticians
DefinesSystem
Elements(BOM)1 &Element
Failure Rates
DefinesSystem
Structure &Calculates
Reliability &Availability2
AnalyzesElementFailureModes
(BottomsUp)
IdentifiesConditions and
Factors thatCause an
Undesired Eventto Occur
(Top-Down)
EvaluatesSystems with
MultipleStates (e.g.,up, down,degraded)
AnalyzesCost of
Reliability,Availability2,MaintenanceIntervals, &
Spares
CalculatesSystem Repair& Maintenance
Measures
AnalyzeAcceleratedLife Testing
Data toPredictProduct
Reliability
Provides aClosed-loop
ProblemReporting& Correc-tive Action
System
ProvidesElectronic
Reporting &Notification
Calcu-latesTotal
Cost ofOwner-
ship
Uses Field& Test Datato CalculateProbabilityof Failure
Prediction RBD FMEA3 Event TreeFault Tree
Markov OpSim (nowincluded in
RBD)
Maintainability ALT FRACAS4
with AuditFeature
Dashboard &Alert Feature
Life-CycleCost
Weibull
FaultTreeFiles
Legend: In Work = ; Not Started = ; File-Transfer Capability Completed = ________ ; Colored Boxes = Software Modules; Shaded Boxes = Not Active at KSC
Notes: 1 – BOM = Bill of Materials; 2 – Availability applies to repairable systems and is a function of Reliability and Maintainability; 3 – FMEA = Failure Modes & Effects Analysis;
4 – FRACAS = Failure Reporting and Corrective Action System.
SearchTechnology’s
TechOASIS s/w(Sponsored by US Army)
NASA’s PRACA data can beimported via snap shots or
live connectors.
NASA’s ProblemReporting & CorrectiveAction (PRACA) Data
EPRI’s CAFTA s/w
PTC’s Windchill Quality Solutions(formerly Relex) s/w
Enterprise Edition-All Modules(Many NASA Centers and contractors have this
software in some form)
Multiple &Dissimilar PRACA
Databases
ModulePurpose
INL’s SAPHIRE s/w andItem Software’s iQRAS s/w(Both sponsored by NASA HQ’s S&MA)
ARINC’s Raptor s/w
ModuleName
ModuleUser
Upload IDAS products to other systems (e.g., Product data management (PDM) or Product lifecycle management (PLM)
IDAS
25
CONSEQUENCE
LIK
ELIH
OO
D
PredictionWeibull
Accelerated Life TestingLife-Cycle Cost
RBDEvent TreeFault TreeFRACASMarkov
Maintainability
FMEADashboard
QualitativeTools
QuantitativeTools
PTC WQS (formerly Relex) Modules from a Risk PerspectiveInternal Website for KSC’s PTC WQS software: https://sp.ksc.nasa.gov/sites/sre/tools/relextool/default.aspx
NASA, Tim C Adams 26
Selected IDAS Modules that Provide Engineering Assurance orTechnical Risk Analyses during the Design Phase
Specify system requirements
Ready forbuild
System effectivenessLife-cycle costs Implement design methods
Failure analysis(Failure Modes & Effects Analysis)
Conduct system safetyanalysis (Fault Tree Analysis)
Allocate and predict reliability(Reliability Block Diagram Analysis)
Aresafety goalsachieved?
Aredesigngoals
achieved?
NoYes
Yes
No
Reference: An Introduction to Reliability and Maintainability Engineering, Figure 8.1, Charles Ebeling, 2005.
NASA, Tim C Adams 27
An Idealized Sequence for Producing Engineering Assurance Analyses
Start Finish
Engineering
Safety &MissionAssurance
0%
100%FFBD RBDA FMEA PRA
WHEN
WHAT
WHO
EFFO
RT
Success Space Analyses Failure Space Analyses
Analytical Products:FFBD = Functional Flow Block DiagramRBDA = Reliability Block Diagram AnalysisFMEA = Failure Modes & Effects AnalysisFTA = Fault Tree AnalysisPRA = Probabilistic Risk Assessment
FTA
Theme:This work sequence (WHEN) builds and uses analytical products (WHAT)in an optimum manner—especially during the Design Phase. Theappropriate mix of experts (WHO and EFFORT) make and deliver the rightanalytical product at the right time. In addition to serving the intendedpurpose at the desired time, each analytical product serves as an inputthat expands the technical fidelity of analytical products that follow.
NASA, Tim C Adams 28
Example 4 IDAS in action-during design
The Constellation Program’s (CxP) Ground Systems RMA Teamconsisted of a contracted two-person team.
RMA Analyses performed in cooperation with ~30 design teams.
For Maintainability:
Used Cut Set analyses combined with the RBD analyses. This providedthe CxP with the target subsystems to focus via:
Procuring more reliable components.
Designing for Maintainability using data for components most likely to fail.
Providing operational workarounds for subsystems targeted most likely to fail.
RMA Team’s recommendations improved subsystem reliability bya factor of ~9.4 using a conservative method.
Applying the 9.4 improvement factor to the Space ShuttleProgram increased the historical 88% probability of launch to98.6%.
NASA, Tim C Adams 29
Example 4 The RMA Team
The RMA Team was viewed as part of the Design Team—
thus, an embedded team approach by the virtue of
support from KSC’s Design Chief Engineer.
The RMA Team was low impact on Design Team.
Normally, two to four meetings were required throughout the
process (total 3 to 8 hours)
The RMA Team provided feedback to the Design Team
throughout the design process.
RMA Analysis was a required product for design review.
NASA, Tim C Adams 30
Example 4 RMA Results (as of year 2011)
At a Preliminary Design Review (PDR), Ground Systemsprojected a 99.5% probability of success during the specifiedtime period:
34 of 57 subsystems analyzed, combined with allocations forremaining subsystems.
90% of Ground Systems unreliability was attributed to lessthan 2% of the possible failure paths
Based on analysis of 15 subsystems
~435,000 Cut Sets with a probability greater than 1X10-16
This type of analytical product provided focus for:
Design changes (“prior to cutting metal”)
Operational workarounds
Maintainability considerations.
NASA, Tim C Adams 31
Example 5 RMA is not just Statistics
The problem—sound familiar?
The test director wants to know if testing can stop after
receiving no failures in 360 tests on a life-critical item.
In particular, does this testing certify that the item is
safe?
NASA, Tim C Adams 32
Example 5 At NASA, this problem was…
The White Sands Test Facility (WSTF) conducted 360 tests
to determine if ignition would occur during the presence
of a small quantity of hydrocarbon oil in 100% oxygen
under adiabatic compression, the compression heating of
oxygen.
None of the WSTF tests produced ignition. These tests
were in response to a hydrocarbon oil contaminate found
in the Portable Life Support System (PLSS) used in the
Extravehicular Mobility Unit (EMU).
NASA, Tim C Adams 33
Example 5 The Extravehicular Mobility Unit
Extravehicular MobilityUnit (EMU) is anindependent systemthat providesenvironmentalprotection, mobility, lifesupport, andcommunications for aSpace Shuttle orInternational SpaceStation (ISS) crewmember to performextra-vehicular activity(EVA) in earth orbit.
NASA, Tim C Adams 34
Example 5 And it is in the news!
To enlargenews article inPowerPoint,leave slideshow anddouble clickthe newsarticle.
NASA, Tim C Adams 35
Example 5 The Reliability response
Method 1 used Classical Test Statistics to determine the
maximum failure rate with a high degree of statistical
confidence. This failure rate did not meet the program’s
failure-rate goal. Thus, more testing would have been
required if only this analysis method was used for decision
making.
Method 2 used Ancillary Data (i.e., similar test data) to
identify a boundary for ignition and no ignition. This
method did not address heat loss and was not sufficient for
decision making.
NASA, Tim C Adams 36
Example 5 From Classical Reliability to Physics
Method 3 used the Arrhenius Reaction Rate Model. This
model adjusted the failure rate found in the first method since
all WSTF testing was done under stressed conditions (higher
pressure). The failure-rate goal was surpassed under certain
assumptions.
Method 4 used Combustion Physics (i.e., Semenov equations)
to address the heat loss not addressed in Method 2. It was
found that the reaction rate was not fast enough to cause
ignition in the PLSS. Thus qualitatively, the quantitative failure-
rate goal was believed to be satisfied with certainty.
Note: Regardless the analytical method, uncertainty needs to
be described. Next, the types of uncertainty will be outlined.
NASA, Tim C Adams 37
Tip When quantitative work is not certain
When the work is probabilistic (not deterministic), characterize point
estimates using uncertainty in order to provide the estimate’s
measure of the “goodness.” There are four types of uncertainty.
Inherent Uncertainty (Physical Variability)
Parameter (Statistical) Uncertainty
The uncertainty in the values of the parameters of a model. Assume the mathematical
form of that model has been agreed to be appropriate.
Model Uncertainty
Related to an issue for which no consensus approach or model exists, and the choice of
approach or model is known to have an effect on the risk model.
Completeness Uncertainty
Represents a type of uncertainty that cannot be quantified and because it represents those
aspects of the system that are, either knowingly or unknowingly, not addressed in the
model.
For more on Uncertainty, see
http://kscsma.ksc.nasa.gov/Reliability/Documents/100128_Uncertainty_Concepts.pdf
NASA, Tim C Adams 38
Summary Reliability (thefirstword foreachitembelow)… Excellence in an organization is more than a support function for maintenance and safety
engineering (pp. 5 & 9).
Is a “design for” operating outcome and is inferred from an engineering drawing since reliabilityis not a physical characteristic that appears on the drawing (p. 6).
As a job enables others (doers, management, and other enablers) to work a collection ofplanned activities to maximize system function and uptime (pp. 7 & 9).
Has four elements in its classic definition: likelihood, function, duration (demand or load), andenvironment (p. 8).
In this document focused on a quantitative view being techniques based on Probability andStatistics (Probabilism), Physics (Determinism), or both (pp. 8 & 10).
Coupled with Maintainability is Availability. Availability is a function of Reliability andMaintainability (R&M) that addresses both uptime and downtime (pp. 7 & 8).
That is quantitative typically encounters three types of data, clock time or cycles (duration),events (demands), and stress-strength (load-capacity) relationships (p. 11).
Quantifies the likelihood axis in a risk matrix with the axis being consequence (p. 11).
As a meaningful product to the organization must be understood by as well as must pace andcollaborate with the decision makers (p. 13).
And Availability in quantitative form are a forecast. As with any forecast, uncertainty providesthe estimate’s measure of the “goodness.” Without such a measure, it is impossible for thedecision maker to judge how closely the predictions relate to or represent reality (p. 38).
NASA, Tim C Adams 39