reliability & maintainability (r&m) allocation, modeling ... · web viewdec 12, 2012  · using...

141
LOG 211 Supportability Analysis Student Guide Lesson 5: Reliability & Maintainability (R&M) Allocation, Modeling, Prediction, and Analysis Content Slide 5-1. Lesson 5: Reliability & Maintainability (R&M) Allocation, Modeling, Prediction, and Analysis Welcome to Lesson 5 on Reliability and Maintainability Allocation, Modeling, Prediction, and Analysis. January 2013 Final v1.3 1 of 141

Upload: others

Post on 10-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

LOG 211 Supportability Analysis

Student Guide

LOG 211 Supportability Analysis

Student Guide

Reliability & Maintainability (R&M) Allocation, Modeling, Prediction, and Analysis

Content

Slide 51. Lesson 5: Reliability & Maintainability (R&M) Allocation, Modeling, Prediction, and Analysis

Welcome to Lesson 5 on Reliability and Maintainability Allocation, Modeling, Prediction, and Analysis.

Introduction

Content

Slide 52. Topic 1: Introduction

Comment by PDallosta: Page 5-3 Slide 5-3 Life cycle management frameworkSTATUS: 5000.02 changes required1) Change Technology Development to Technology Maturation and Risk Reduction –use of the ‘&’ instead of ‘and’ is consistent with 5000.02

Technology Maturation & Risk Reduction

Slide 53. Life Cycle Management Framework:

Where Are You? What Influence Do You Have?

Welcome to the next phase of the Life Cycle Management Framework: Engineering and Manufacturing Development (EMD).

DoDI 5000.02, Enclosure 2 states, "The purpose of the EMD Phase is to:

Develop a system or an increment of capability;

Complete full system integration (technology risk reduction occurs during Technology Management and Risk Reduction TMRR)); Comment by PDallosta: Updated to TMRR

Develop an affordable and executable manufacturing process;

Ensure operational supportability with particular attention to minimizing the logistics footprint;

Implement human systems integration (HSI);

Design for producibility; ensure affordability;

Protect continuous process improvement (CPI) by implementing appropriate techniques such as anti-tamper; and

Demonstrate system integration, interoperability, safety, and utility.

The CDD, Acquisition Strategy, SEP, and Test and Evaluation Master Plan (TEMP) shall guide this effort."

Content

The primary purpose of systems engineering in EMD is to reduce system-level risk. This phase consists of two major, sequential efforts: Integrated System Design and System Capability and Manufacturing Process Demonstration. EMD begins with the Milestone B Decision. Section 5.4.3.1 of the Defense Acquisition Guide (DAG) describes the EMD as the phase that “develops a detailed integrated design and ensures producibility and operational supportability."

Previously, you learned that there are three major LCL activities: Design for Support, Design the Support, and Support the Design. In the Technology Maturation and Risk Reduction (TMRR) Phase, the LCL identifies design considerations for the competitive prototypes as part of the Design Interface IPS Elements that influence performance and sustainment. Comment by PDallosta: Updated to TMRR

During the EMD Phase, the focus is on developing the requirements for the long-term performance-based Product Support concept and the initial Product Support package. The critical Sustainment metrics are captured as KPPs and KSAs. Recall from Lesson 3 that these are KPP: Availability; KSA: Reliability and Operation and Support Cost. Also during this phase, the support concept is refined and potential support providers are identified. Incentives to design for support and to design a cost-effective support concept can, and should, be linked to the Product Support Strategy. (See Section 5.4.3.2 of the DAG).

Recall that you completed the Technology Maturation and Risk Reduction (TMRR) Development Phase by initializing the Logistics Product Database. In EMD, you will begin to use, refine, and update this database as you perform several key analyses including:Comment by PDallosta: Updated to TMRR

Reliability and Maintainability Allocation, Modeling, Prediction and Analysis

Failure Mode Effects and Criticality Analysis (FMECA)

Fault Tree Analysis

Reliability Centered Maintenance (RCM) Analysis

Maintenance Task Analysis (MTA)

Level of Repair Analysis (LORA)

Trade-off Analysis (occurring at multiple points during these Supportability analyses)

This lesson will focus on the first set of analyses, Reliability and Maintainability Allocation, Modeling, Prediction, and Analysis, also called R&M.

EMD begins at Milestone (MS) B and its key emphasis with respect to Sustainment is to ensure operational Supportability with particular attention to minimizing the logistics footprint. Sustainment requirements should be an integral part of the systems engineering design process. Effective Supportability begins with the development of Sustainment requirements to drive the design and development of reliable, maintainable, and affordable systems through the continuous application of the systems engineering methodology focusing on Affordable System Operational Effectiveness. Reliability is a prime determinant of long-term support costs. When programs are found to be unsuitable at the time of Initial Operational Test and Evaluation because they do not achieve Reliability goals, there are serious consequences for both Operational Suitability and Affordability.

As stated in paragraph 5.4.3.1 of the DAG, "From a Sustainment perspective this means paying particular attention to reducing the logistics footprint; implementing human systems integration; designing for supportability; and ensuring affordability, integration with the supply chain, interoperability, and safety. All of these factors are used to refine the performance-based support concept and strategy, with the associated requirements, and to identify potential support providers."

The LCL’s Supportability Analysis during EMD feeds the Logistics Product Data (LPD) necessary for the subsequent phases of the Life Cycle Management Framework. Early in EMD, the LCL can influence requirements necessary to design the support and, later in this phase, the LCL works to support the design.

SAE GEIA-STD-0007

Slide 54. Engineering & Manufacturing Development Phase

In a broad sense, the EMD Phase objectives relate to the build out of the Logistics Product Database. These objectives are realized through various program documents and Supportability analyses. The results, or outputs, of these R&M analyses serve as inputs to:

Update the Logistics Product Data (LPD)

Inform Design Interface IPT recommendations

Update program documents such as the:

RAM-C Rationale Report

Life Cycle Sustainment Plan (LCSP)

Systems Engineering Plan (SEP)

This phase addresses the questions:

What is the maturity of program artifacts, such as the LPD and program documents?

What is the maturity of the design – functionally, physically; technology maturity level and its impact on Supportability

What is the Sustainment maturity of the system? Does the design represent the most affordable system with the least logistics footprint to meet operational effectiveness and suitability requirements?

The EMD objectives correspond to the LPD found Table B, Reliability-RAM/FMECA/RCM, and Table C, O&M Tasks. As you continue through the EMD Phase, Table H, Provisioning, and others are impacted as Supportability analyses refine the LPD. R&M Allocation, Modeling, Prediction and Analysis will feed the subsequent analyses ( i.e., FMECA, FTA, LORA, RCM/CBM+, MTA) by updating the SAE GEIA-STD-0007 Logistics Product Data, in particular Table B, as well as, by updating results in various reports.

As noted in the RAM-C Rationale Report Manual, an initial RAM-C Report should be appended to the Analysis of Alternatives (AoA) in preparation for a Milestone A Decision. This report may be limited in scope due to the many unknowns at this stage of program, but will still articulate the RAM and Sustainment requirements in terms of a preferred system concept, support and Maintenance Concept, and Technology Development StrategyAcquisition Strategy. As noted in DTM 11-003, DoDI 5000.02, Enclosure 3, Systems Engineering, paragraph 12 Reliability Analysis, Planning, Tracking, and Reporting, "tThe RAM-Cis report provides a quantitative basis for Reliability requirements and improves cost estimates and program planning. The report shall beis attached to the Systems Engineering Plan (SEP) at MS A and updated in support of MS B and C. "Comment by PDallosta: Updated to reflect incorporation of DTM 11-003 in DoDI 5000.02

The RAM-C Report is the first detailed report that includes a comprehensive analysis of the system and its planned use. This analysis includes the planned operating environment, operating tempo, Sustainment requirements, Maintenance Concept and Product Support approaches, and supply chain solutions with appropriate assumptions. The RAM-C Report will provide a clear statement of how the system’s Sustainment requirements will be measured throughout the EMD Phase, Production and Deployment Phase, and the Operations and Support Phase.

You will find additional information on decision points and program categories in the RAM-C Rationale Report Manual, June 24, 2009.

R&M, the first of the Supportability analyses in the EMD phase, enables system Supportability by:

Identifying and selecting Key Performance Parameter (KPP) requirements and other data

Allocating and modeling Key Performance Parameter (KPP) requirement data to lower levels of indenture

Validating technical performance

Enabling later Supportability analyses to determine the logistics footprint

Updating the SAE GEIA-STD-0007 Logistics Product Data, in particular, Table B data

Updating results in various reports

Content

Slide 55. R&M Lesson Approach

Key questions to ask in this lesson are:

Where do you find data for doing Supportability analyses?

How do you select, prioritize, and update data?

What R&M activities should you do to enable Supportability?

What do you do with the results of the R&M analyses?

Whom do you inform and what documents should you update?

Content

Slide 5-6. Topics and Objectives

Overview of Reliability and Maintainability (R&M) Analysis

Content

Slide 5-7. Topic 2: Overview of Reliability and Maintainability (R&M) Analysis

Content

Slide 5-8. What is R&M?

Reliability can be described as a discipline related to the design, development, test, and manufacture of an item, so that it successfully performs a certain task, under specified conditions, for a certain length of time or number of cycles with a specified probability. Reliability measures the probability that the system will perform without failure over a specified interval under specified conditions. Reliability must be sufficient to support the warfighting capability needed in its expected operating environment. Considerations of Reliability must support both Availability KPPs. Reliability may be expressed initially as a desired failure-free interval that can be converted to a failure frequency for use as a requirement.

Maintainability is the probability that an equipment item will be retained in, or restored to, a specified condition in a given period of time, when maintenance is performed in accordance with prescribed procedures and resources.

Reliability and Maintainability (R&M) is a set of disciplined, iterative processes used to allocate requirements, model the system, and predict R&M performance to determine the most affordable and operationally effective system to meet the user needs when fielded.

R&M:

Influences system design and cost

Translates life cycle data into information

Outputs are inputs to follow-on Supportability analyses

FMECA / FTA

RCM Analysis

‘What-if’ or Conditional Modeling

Business Case Analysis Tradeoffs

Recall from Lesson 3 that Measures of Effectiveness (MOEs) are meaningful measures defined by user needs. They facilitate and sustain the right system improvements over the life cycle. MOEs become Technical Performance Measures (TPMs) as they are refined and become measurable.

The key to understanding R&M is the iterative nature of the process. This diagram presents a high-level view of the process and relationship of interface design and sustaining engineering. During the EMD Systems Engineers and LCLs focus on the Technical Performance Measures (TPMs), also referred to as metrics. TPMs are measures the program uses to monitor the progress of the design in relationship to Supportability. They assess design and risk, validate requirements, and test KPPs/KSAs.

As depicted in the diagrammed process, the first three steps of R&M are iteratively tied to the allocation process. For a new system, R&M allocations often rely upon similar systems' data for benchmarks. Allocations enable the systems engineer to set Reliability goals and for the Integrated Product Teams (IPTs) to examine trade space. As real data for the new system becomes available, new trade-offs may be required.

Step 1 Predictions

If you are beginning with a new acquisition, for example, the Strike Talon UAV case study in this course, then you will begin at Step 1. However, if you are involved in a fielded system, your point of entry in this diagram would be at Step 4.

At Step 1, you have the system level requirements, and perhaps additional data from similar systems, as well as reliability prediction sources such as MIL-HDBK-217. Your allocations will begin with these sources, but as the program matures, the data will become more refined, reaching lower levels of product indenture.

In either case, either preliminary / predicted data within a new acquisition, or from the sustaining command/actual data of a fielded system, is then used to populate the next step, which is System Modeling.

Step 2 Models

Modeling involves creating Reliability Block Diagrams as the system moves from a pure functional to a more physical shape. RBDs are an authoritative source of engineering data for many supportability analyses. RBDs also provide insight into redundancy and Level of Repair for life cycle cost estimates. Failure rates are at the root of all of these analyses. The results of R&M are then used as inputs to Step 3.

Step 3 Analysis

This step involves Failure Mode, Effects and Criticality Analysis (FMECA) as well as Fault Tree Analysis (FTA). FMECA and FTA identify system failures, their probabilities, and recommended mitigation strategies to reduce the risk of secondary impacts of failures. These analyses' outputs inform the design or maintenance changes necessary to increase system Reliability and the prevention of critical failures. These processes will be covered in the next Lesson 6.

Step 4 Feedback

In an acquisition, newly developed systems won’t have field data available, but testing data principally derived from both Developmental Test & Evaluation (DT&E) and Initial Operational Test & Evaluation (IOT&E) events will become available. A Testing and Evaluation (T&E) event often provides data different from initial predictions, prompting updates to the Reliability Block Diagrams (RBDs). Thus, a T&E event often triggers overwriting predictions with actuals and reengaging models and related analyses using Reliability information where appropriate. Updates to Reliability Models should always be compared to Reliability threshold and objectives. If results show the system design non-compliant with Sustainment KPPs and KSAs, the program team seeks a remedy through Reliability trade space (borrowing Reliability from systems having surplus) or Reliability Growth (redesigning components having the lowest Reliability).

Step 5 Reports

The modeling and simulation outputs from R&M analyses are used in subsequent analyses. R&M also updates the RAM-C Rationale Report for inclusion in both the System Engineering Plan (SEP) and the Life Cycle Sustainment Plan (LCSP).

Slide 5-9. What is RAM-C? Reliability and Availability

Reliability is the probability an item performs a:

Required function

Under stated conditions

For a specified period of time

Reliability is usually referred to as Mean Time between Failure (MTBF). MTBF is expressed as the total operating hours divided by the total number of failures.

Availability

Operational Availability (AO) is the measure of the percentage of time that a system or group of systems within a unit are operationally capable of performing an assigned mission. It is expressed as “uptime” divided by the sum of “uptime” plus “downtime”. AO is an integral step to determining the readiness metric expressed by Materiel Availability (AM). Recall from Lesson 3 that Materiel Availability is a measure of the percentage of the total inventory of a system operationally capable (ready for tasking) of performing an assigned mission at a given time, based on materiel condition.

As stated in the RAM-C Manual, paragraph 3.2.3.1, "The goal is to balance the Sustainment metrics—not to maximize AM (as current approaches usually attempt to maximize AO). AM is a system design value that will decrease only when there are more systems “down” than originally planned."

Availability is a measure of :

How often failures occur and corrective maintenance is required

How often preventative maintenance is performed

How quickly indicated failures can be isolated and repaired

How quickly preventive maintenance tasks can be performed

How long logistics support delays contribute to downtime

Content

Slide 5-10. What is RAM-C? Maintainability and Cost

Maintainability

The system design attributes of Maintainability include: accessiblity, modularity, and testability. Maintainability is best achieved when these three attributes support the ability of an item to be retained in, or restored to, a specified condition once maintenance is performed. Mean Time to Repair (MTTR) is the testable metric for evaluation of Maintainability. Maintainability engineering is the composite of activities, methods, and practices used to influence the system design in order to minimize necessary system maintenance requirements and associated costs for both preventive and corrective maintenance.

Maintainability is unique in its ability to influence design as it serves as an enabler of rapid restoral processes and architectures.

Characteristic Outcome

Modularity enables restoral by remove/replace v. repair (Functional/Physical characteristics)

Accessibility enables seeing, reaching and moving an item (Physical/Human Factors characteristics)

Testability enables detection and isolation of failures (Diagnostics/Prognostics/Health

Management Standardization enables conformity and consistency (Cost and producibility)

Maintainability should be a designed-in capability and not an add-on option. From a design influence perspective, timely focus is required on issues pertaining to physical accessibility, performance monitoring and fault localization, built-in-test implementation (coverage and efficiency), false alarms, failure diagnostics and system prognostics. In simple terms, the intent is to reduce the time it takes for a properly trained maintainer to isolate the failure and fix it. Maintainability is a prime measure of quality for the user. Intrinsic factors contributing to Maintainability are: modularity, interoperability, diagnostics, prognostics, fail safe, and access. (2012 Annual RELIABILITY and MAINTAINABILITY Symposium. "Supportability Analysis") . System Maintainability affects:

System Availability – a function of Reliability, Maintainability, and resources

System Safety – Maintainability reduces the risk of injury or damage to equipment

Total cost of ownership (and life cycle cost) – good Maintainability reduces the time and resources required to maintain systems

User confidence

Warranty cost

Cost

Costs include maintenance, spares, fuel and support. Ownership Cost provides balance to the Sustainment solution by ensuring that the Operation and Support (O&S) Cost KSA associated with the Sustainment KPP Availability is considered in making program decisions.

The concept of Total Ownership Cost (TOC) captures the true cost of design, development, ownership, and support of DoD weapons systems. To the extent that new systems can be designed to be more reliable, or have fewer failures, and more maintainable, or need fewer resources, with no offsetting increase in the cost of the system or spares, the TOC for these systems will be lower. The Systems Engineering process ensures implementation of activities intended to design Supportability into the system during the Materiel Solution Analysis, Technology Maturation and Risk Reduction, Development, and Engineering and Manufacturing Development phases when large returns on investment are available.Comment by PDallosta: Update to reflect TMRR

Achieving RAM results in improved readiness, safety mission success and Maintainability. It also means reduced TOC and logistics footprint.

Content

Slide 5-11. RAM-C Relationships

This graphic illustrates the relationship between Reliability, Sustainment, and operational effectiveness. As pointed out in section 3.2.1 of the RAM-C Rationale Report Manual, "The figure shows the theoretical effect of Reliability and Sustainment cycle time on Life Cycle Costs (LCC). For example, a system that exhibits low Reliability may require high Sustainment cycle times, mainly due to numerous repair cycles being required, which will result in high ownership cost and thus high LCC. The objective is to achieve a balance between development, production, and operating and support costs that results in minimal Life Cycle Costs."

The graphic "illustrates the outcomes of the trade-offs between Reliability and Maintainability on Availability and Life Cycle Cost (LCC). Systems with low reliability and high Mean Down Time resulting from poor maintainability will demonstrate low levels of Availability, high Operating & Support Cost (O&S) and high Life Cycle Cost (LCC) due to the increased number and length of repair cycles. Conversely, systems with high reliability and low MDT will demonstrate high Availability, lower Operating Cost and decreased LCC. The key to determining the right mix of Reliability, Availability, and Maintainability is addressing the relationship between Research & Development (R&D) Costs, Acquisition." (2012 Annual RELIABILITY and MAINTAINABILITY Symposium. "Supportability Analysis")

RAM-C brings the Life Cycle Cost target into focus by asking:

If we invest a dollar in Reliability, Availability and Maintainability now, how much are O&S costs reduced later?

What is the best investment in the design now to meet our target?

The “Target Area” is the point where equilibrium between Reliability and Sustainment cycle time is achieved. As illustrated by the inflection point in the Life Cycle Cost curve, the impact of the investment (increased spending) on Reliability has a corresponding improvement (decreased cost) in Sustainment cycle time, which reflects the optimization of Life Cycle Cost.

KPPs and KSAs documented in the draft CDD are expressed using threshold and objective values wherein (1) Threshold represents the absolute minimum performance acceptable; and (2) Objective represents the performance most desired. Trade space falls between the objective and threshold levels. The difference between threshold and objective represents the trade space in which cost, performance, schedule, and Supportability are analyzed to achieve the optimum solution.

The United States Army Materiel Systems Analysis Activity (AMSAA) Information Paper on Materiel Availability (AM), Materiel Reliability (RM) [Note: JCIDS Guidance, January 2012 redefined RM as Reliability], and Operational Availability (AO), 22 December 2008, notes in its summary,

"The Sustainment KPP AM metric has been defined in OSD and JCS documentation in a way that makes it clear how it differs from AO; i.e., it must apply to the entire fielded inventory of systems, over the entire life-cycle of the system and incorporate all categories of downtime. The best way to view the relationship between AM and AO is to see AM as a function of AO, together with many other variables. The best way to assess both AM and AO is through comprehensive modeling and simulation. Reliability is the cornerstone that ensures both AM and AO requirements can be met. Reliability is far more important in determining the level of Availability that is achievable than any other component of logistics system.

Higher Reliability will result in higher Operational Availability and Materiel Availability since both AO and AM are a function of Reliability.

The inherent Reliability of a system is, by far, the biggest contributor to high Operational Availability.

The biggest impact on AO comes from increasing Reliability.

Doubling MTBF has the effect of increasing average AO far greater than any of the other variables tested.

The second biggest impact is made by increasing the Availability of repair parts.

Content

Slide 5-12. ASOE Model

R&M and the Affordable System Operational Effectiveness Model

In the Affordable System Operational Effectiveness (ASOE) Model, emphasis is placed on designing for increased Reliability and reduced logistics footprint and on providing effective product support through performance-based logistics (PBL) strategies. In the ASOE model, numerous trade-offs between system performance, Availability, process efficiency, human factors, and cost are needed to maximize weapons system's operational effectiveness. To support such trade-offs, the cause-and-effect relationships must be made explicit between design decisions and system operations and support.

The elements of Technical Performance and Supportability enhance Design Effectiveness. Design Effectiveness in combination with Process Efficiency elements ensures Mission Effectiveness. It is through Mission Effectiveness, when coupled with Ownership Costs, that the most effective, operational and affordable system is realized.

Together with system performance, functions, and capabilities, a primary focus during design and architecture development is on system Reliability. This requires an understanding of the mission and operational capabilities, mission profiles, and operational environment(s). It is the system capabilities definition activity that offers the first and most significant opportunity to positively influence a system from the perspective of Reliability.

Trade-offs among ‘Time to Failure,’ system performance, and system life cycle cost are necessary to ensure the correct balance and to maximize system technical effectiveness. Subsequent to capabilities definition, as system design and development process (for new and upgraded/fielded programs) progress to the system architecture formulation phase, factors of system Reliability become even more important.

Technical Performance is defined during activities and tasks in the Materiel Solutions Analysis and Technology Maturation and Risk Reduction (TMRR) Development Phases, such as defining the Maintenance Concept and initializing the Logistics Product Database and Logistics Product Structure.Comment by PDallosta: Update to reflect TMRR

Supportability begins with the initial Reliability and Maintainability Analysis of the competitive prototypes in the Technology Maturation and Risk Reduction (TMRR) Development phase, and culminates in the EMD Phase with the analysis of the final design. By performing allocation, modeling, and prediction, R&M impacts Supportability in the following ways:Comment by PDallosta: Update to reflect TMRR

Determination of the Threshold and Objective values of Key Performance Parameters (KPP/KSA)

Identification of single points of failure

Specification of redundancy requirements

Identification of the design’s Maintainability architecture

Modularity, Accessibility, and Testability

These impacts help determine the interaction between accessibility, modularity, and testability by determining what Maintainability characteristics should be incorporated into the design, and the performance requirements levels they should meet. Those characteristics will be further analyzed and assessed in subsequent EMD phase analyses including: FMECA, FTA, RCM Analysis, MTA, and LORA in order to trade-off CDD Requirements versus Maintainability versus Cost. 

Content

Slide 5-13. ASOE Trade-off Requirements vs. R&M Design Cost

R&M plays a major role in achieving Affordability and Supportability through the conduct of several trade-off analyses. In this slide, the Technical Performance requirements defined in the CDD are initially analyzed to find the Reliability and Maintainability predictions identifying the system level requirements and then allocating them to the lower levels of indenture. Then, as the Supportability analyses progress, as the data matures, and as cost is brought into the equation, it becomes necessary to determine how Reliability and Maintainability levels can be supported at an affordable level.

This ASOE trade-off assesses Requirements (CDD) vs. Maintainability vs. Cost

This ASOE connection investigates benefit vs. cost of achieving the Availability KPP/KSAs through accessibility, modularity, and testability and answers the following question:

What Maintainability characteristics and capabilities are priorities for incorporation based on the maximum Return on Investment?

The Operation & Support (O&S) Cost KSA provides balance to the Sustainment solution by ensuring that the costs associated with the Availability KPP are considered in making program decisions.

Content

Slide 5-14. R&M Life Cycle Role

Another way to understand the iterative nature of R&M is by examining its role in the life cycle. As a design tool, R&M can be used to achieve performance requirements and to identify and mitigate cost drivers throughout the life cycle. In general, the role of R&M is to set in motion a process that is built on the notion that as a design matures, its data matures.

As noted in DTM 11-003DoDI 5000.02, Enclosure 3, Systems Engineering, paragraph 12, "A comprehensive Reliability and Maintainability (R&M) program [uses] an appropriate Reliability growth strategy to improve R&M performance until R&M requirements are satisfied. RAM-C analytics influence design, support, and cost. It has the most impact early in the life cycle and is central to Affordable System Operational Effectiveness.” Comment by PDallosta: Upated to reflect incorporation of 11-003 in 5000.02

The RAM-C Manual, paragraph 2.5.2, points out that, "Implicit in the effort to reduce OC [Ownership Cost] is the effort to reduce maintenance burden, infrastructure requirements, and logistics footprint as all of these are logistics degraders. To achieve these reductions in a systematic fashion, the PM should develop a defined starting point, or baseline, from which to measure the value of the evolving engineering design as it relates to reducing total ownership costs."

An important aspect of the Sustainment metrics is the balancing of system performance and program cost. This balance enables the materiel solution to be operationally effective, suitable, and affordable. Systems that exhibit low Reliability may require high Sustainment cycle times, mainly due to numerous repair cycles being required. In turn, this results in high Ownership Cost (OC) and thus high Life Cycle Cost (LCC). The objective is to achieve a balance between development, production, and operating and support costs which results in minimal life cycle costs.

Requirements to Design:

Gather Data to quantify achievement of the KPP/KSAs

Allocate, Model, Predict and Analyze Reliability & Maintainability

Publish Reliability Growth Plan

Conduct RAM Trade-off Analyses

Design to Fielding:

Maintain and analyze Failure Reporting Analysis and Corrective Action System (FRACAS)

Update FMECA, Fault Tree and RBD with T&E data

Conduct RCM Analysis

Update Reliability Growth Plan

Conduct MTA and Maintainability Demonstration (M-Demo)

Evaluate KPP and KSAs achievement

Update LCSP with Trade-offs

Fielding to Sustainment:

Maintain and analyze FRACAS data

Establish triggers (e.g., frequency, duration, and cost) for system logistics degraders identified and captured in the baseline assessment during the initial Supportability Analysis

Update models and simulations

Provide traceable analyses: trends, LORA, and Trade-offs;

Sustainment to Disposal:

Maintain and analyze FRACAS data

Evaluate KPP/KSAs achievement

Identify R&M shortfalls

Update Business Case Analysis (BCA)

Perform Sustaining Engineering to incorporate Materiel and Non-Materiel changes as appropriate

Identify/Institutionalize Lessons Learned at national level

Identify/Institutionalize LCSP best practices

Content

Slide 5-15. R&M: Influence

R&M processes include the allocation of system level requirements down to lower levels of indenture. Then these parameters are aligned with data sources that are selected from an order of precedence that favors actual data over surrogate data to conduct the RAM-C Analysis using various analytical models. The outputs of the allocation and Reliability Block Diagram models are used in predictions and subsequent analyses for Supportability and Product Support.

This is an iterative process that provides increasingly refined data that will influence the requirements, design, and support functions to ensure Affordable System Operational Effectiveness.

As stated in the RAM-C Manual, paragraph 3.2.1, "The balanced solution will determine the optimal points for Reliability and Sustainment cycle time early in program development, thus ensuring an acceptable Life Cycle Cost (LCC) for the system consistent with needed mission functional performance. Note that the optimal Reliability value must be sufficient to meet the most strenuous warfighter requirements, which may result in the system having higher than the minimum possible LCC."

RAM requirements feed Table B in the Logistics

Content

R&M influences requirements, design, support and follow-on analyses:

Lower level R&M requirements for the LRU and SRU levels of indenture

R&M predictions for use in:

Reliability Centered Maintenance (RCM) Analysis

Level of Repair Analysis (LORA)

Failure Mode Effects and Critical Analysis (FMECA

Fault Tree Analysis (FTA)

Spares calculations

Evaluations of performance (i.e., are the design requirements being (going to be) met?)

Evaluations of mission performance

Development of Operation and Support (O&S) Cost estimates

R&M activities are conducted throughout the life cycle of a system, from establishing requirements to Sustainment.

Content

Slide 516. R&M Inputs and Outputs

Inputs

Top tier (KPP/KSA) R&M requirements include data for making predictions using the Reliability and Maintainability models. The data initially comes from surrogate sources or from similar systems. The order of precedence is field data, test data, similarity data, and prediction data (e.g., MIL-HDBK-217).

Test data eventually will be available for making predictions.

Metrics commonly used in R&M Analysis include the following:

MTBF: Mean Time between Failure is the average time between system failures under specified conditions.

MTBCF: Mean Time between Critical Failures: is the mean time in operating hours between critical failures; this is equivalent to mean operating hours between operational mission failures.

MTTR: Mean Time To Repair is the average time required to repair the system after failure (active repair time only)

MTTR (CF): Mean Time to Repair (Critical Failures): is the mean time in clock hours to repair operational critical failure.

MFHBFA: Mean Flight Hours between False Alarms: is the mean time in flying hours between false alarms.

MCT: Mean Corrective Maintenance Time

Content

Actions to take on inputs

Allocation: A disciplined, iterative process for deriving lower level R&M requirements from upper level requirements. The allocation process typically uses a hierarchical model beginning at the system level and progressing to successively lower levels of indenture. Common allocation methods include Similarity, Equal, ARINC, and Feasibility of Objectives.

Modeling: R&M Modeling can include Functional Flow models, hierarchical models, simulation and Reliability Block Diagrams (RBDs). Simulation usually refers to a mathematical representation of the system incorporated in a software tool. Simulations are mathematical representations of reality, often greatly simplified using key assumptions. The better the assumptions, the better the model results. Modeling software takes system complexity into consideration as an indicator for Reliability and maintenance predictions. Also, models are significantly influenced by a ‘weak link’, or poor performing component. The Monte Carlo simulation technique is common in many Reliability tool sets.

Prediction: A disciplined, iterative process for estimating the level of R&M that the current design will achieve, typically uses Reliability Block Diagram (RBD) modeling software. The purposes of Reliability predictions are to:

Evaluate feasibility of a design with regard to its Reliability requirement

Compare competing designs

Provide input to support other design activities, such as FMECA

Forecast logistics needs (spares, maintenance labor, test equipment, etc.)

Analysis: Disciplined and iterative process that uses models and simulation to first allocate and then evaluate R&M requirements at all levels of indenture.

Outputs

Update Table B in Logistics Product Database

Lower Tier R&M Requirements at LRU and SRU levels used in:

Spares calculations

Evaluations of contactor performance (i.e., are the design requirements being [going to be] met?).

Failure Mode, Effects and Criticality Analysis (FMECA)

Fault Tree Analysis (FTA)

Reliability Centered Maintenance (RCM) Analysis

Level of Repair Analysis (LORA)

Content

SAE GEIA-STD-0007

Slide 5-17. R&M Data Management

R&M Analysis is all about requirements and data management around Reliability. Reliability Growth Planning is required early in acquisitions and is a constant through each milestone to achieve positive improvement due to implementation of corrective actions to system design, operation or Reliability Centered Maintenance (RCM) procedures. During manufacturing production, Reliability is focused toward reducing defect rates to assure the intended Reliability is achieved. Emphasis of Reliability improvement is prevention of failure. You will learn more about this in the next topic.

R&M refines requirements through a series of iterative processes during the Technology Maturation and Risk Reduction (TMRR) Development (TD) Phase and through the EMD Phase. A database is only as good as its data, and the goal for R&M is to validate existing requirements data and update the Logistics Product Database with new usable requirements data. Comment by PDallosta: Updated to reflect TMRR

In the Set-Up phase, you derive inputs from the Logistics Product Database and use that data to perform allocations of requirements to lower levels of indenture. This enables a richer and more complete analysis with appropriated detail to assure Affordability and Supportability are achieved across the life cycle of the system.

Next, you must prioritize other data sources as you begin modeling the system Reliability. The output of this process will provide the necessary data needed to determine trade-offs, recommend allocations and models to the IPT, and refresh the database.

Some best practices for refining requirements include:

Every requirement must be traceable to the user needs

Specific Systems Engineering and Supportability Analysis tasks to design in the required levels of R&M must be identified.

For every requirement, there must be a defined method of verifying whether or not the requirement has been met. Verification can be conducted through analysis or test.

The characteristics of a "good" requirement include the following:

Achievable. It should specifically reflect a need or objective for which a solution is technically realistic at costs considered to be affordable.

Verifiable. Its expected performance and functional utility should be expressed in a manner that allows objective, quantifiable verification. It should not be defined by ambiguous words, e.g., excessive, sufficient, resistant, minimal, etc.

Unambiguous. It should have only one possible meaning so it is uniquely testable and verifiable.

Complete. It should contain all information needed to interpret and verify the requirement, including environmental and/or operational conditions relevant to the requirement.

Performance-based. It should be expressed in terms of need or outcome, not solution, i.e., it should address “why” and “what” of a need, not how to do it.

Consistent with other requirements. Conflicts should be resolved prior to release of a Request for Proposal (RFP). Appropriate for the level of system hierarchy. It should not be so detailed that it constrains solutions for the current level of design, e.g., detailed requirements relating to components would not normally be in a system-level specification.

R&M Process – Set Up

Content

Slide 5-18. Topic 3: R&M Process – Set Up

Content

Slide 5-19. R&M – Set Up

Some key questions to ask in this topic are:

Where do you find data for performing Supportability analyses?

How do you select, prioritize, and update data?

There are three steps in Set Up for the R&M Analysis: (1.1) Build Plan, (1.2) Perform Research, and (1.3) Define R&M Data Inputs. Each of these steps will be discussed in the context of considerations for Project Management, Market Research, and Data Management, respectively.

Project Management in Step 1, Build Plan, considers defining the Reliability Growth Plan and the strategy to execute that plan, including: milestone events, required deliverables, responsibilities and face-to-face updates.

Market Research in Step 2, Perform Research, guides considerations in the identification of similar systems and to collect those systems' Reliability and Maintainability allocations. It is also used to find appropriate analysis and modeling tools.

Data Management in Step 3, Define R&M Data Inputs, guides considerations for available system design and requirements data. It also is used to identify priority of use or order of precedence and ensure the most accurate technical data is loaded into Reliability modeling tools and the Logistics Product Database.

Content

Comment by PDallosta: Page 5-33, Slide 5-20, update to TMRR

Technology Maturation & Risk Reduction

Slide 5-20. Build Plan

The first step in Set Up is to Build the Plan. Project Management considers coordination measures to assure the most accurate Reliability data are reported and shared. DoD Directives require programs to manage and report this Reliability data through a Reliability Growth Plan. Reliability Growth is interwoven with a system’s Affordability and Sustainability. The first place that this becomes evident is during R&M Analysis. As the Supportability analyses proceed, the data mature and the project continues to refine its design, the interface between Design and Sustainment and the Logistics Product Database used in Supportability analyses

As stated in the Department of Defense Handbook, Reliability Growth Management, 14 June 2011 (MIL-HDBK-189C), "Reliability growth planning addresses program schedules, amount of testing, resources available, and the realism of the test program in achieving the requirements. The planning is quantified and reflected in the construction of a Reliability growth planning curve and the necessary supporting Reliability activities. This curve establishes interim Reliability goals throughout the program." You will examine this curve in the next slide.

Content

Step 1: Understand and Communicate RAM-C User Needs and Constraints

Determine Reliability, Availability and Maintainability and Cost (RAM-C) needs

Develop collaboratively a RAM-C Rationale, using modeling to ensure performance is achieved

Compare the needed levels of RAM-C to current systems/capabilities

Initiate a Reliability Growth Plan

Perform Initial Reliability allocation

Step 2: Design and Redesign for RAM-C

Develop a conceptual system model for use throughout system development to estimate life cycle performance metrics

Use data from component-level testing to refine the system model

Conduct sufficient analysis to determine if the design is capable of meeting RAM-C requirements.

Design in:

Testability to reduce false removals

Redundancy to mitigate or prevent failure

Modularity to facilitate remove-and-replace maintenance

Accessibility to efficiently reach a failed component

During Testing, execute Reliability Growth Plan: Test-Analyze-and-Fix

Step 3: Produce Affordable Operationally Effective System

Conduct Design Trade-offs

Collect Initial Field Data feedback and incorporate Reliability Growth into production

Step 4: Monitor Field Performance

Continue Field Data Collection

Set triggers for RAM intervention

Slide 5-21. Build Plan: Reliability Growth Plan

Reliability Growth is the improvement of a product's Reliability over time as the result of identifying, analyzing and fixing design deficiencies through testing, and the elimination or minimization of the deficiencies through design changes.

As stated in DTM 11-003DoDI 5000.02, Enclosure 3, Systems Engineering, paragraph 12, "Program Managers (PMs) shall formulate a comprehensive Reliability and Maintainability (R&M) program using an appropriate Reliability Growth strategy to improve R&M performance until R&M requirements are satisfied. Reliability Growth Curves (RGC) shall reflect the Reliability growth strategy and be employed to plan, illustrate, and report Reliability growth. A RGC shall be included in the SEP at MS A, and updated in the TEMP beginning at MS B." Comment by PDallosta: Update to 5000.02

The Reliability Growth Curve establishes interim Reliability goals throughout the program. The Reliability Growth Curve indicates the estimated Life Cycle Cost associated with the projected Reliability value.

Systems that exhibit low Reliability may require high Sustainment cycle times, mainly due to numerous repair cycles being required, which will result in high Ownership Cost (OC) and thus high Life Cycle Cost (LCC). The objective is to achieve a balance between development, production, and operating and support costs that result in minimal Life Cycle Costs. Supportability and Maintainability concepts considered should include system Mean Down Time (MDT) optimization and ease of system maintenance.

Reliability Growth is achieved by eliminating or mitigating failure modes, which results in decreasing the failure rate. The Reliability Growth Curve, as shown in red, connects each “Test-Analyze-And-Fix” (TAAF) test event. The flattening out of the curve means that the Reliability improvement rate is still positive but with lessened Reliability return on investment as had occurred earlier in the program.

According to MIL-HDBK-189C, paragraph 4.9.1, "Reliability growth is the result of an iterative design process. As the design matures, test and evaluation events investigate and validate whether actual or potential sources of failures have been addressed. If reliability is not achieved, further design effort is invested. There are four essential elements involved in achieving Reliability growth:

Failure mode discovery;

Feedback of problems identified;

Failure mode root cause analysis and proposed corrective action; and

Approval and implementation of proposed corrective action.

Furthermore, if failure sources are detected by testing, another element is necessary:

Fabrication of hardware."

Many of these elements are also a part of Failure Mode Effects and Criticality Analysis / Fault Tree Analysis (FMECA / FTA) that are included in the broader Systems Engineering closed loop process.

Content

Slide 5-22. Build Plan – Reliability Growth

There comes a point at which investing in design enhancements no longer yields significant improvement in Reliability, as visualized by the ‘flat’ curve (little rise in reliability for a significant run in investment dollars). Where the Reliability Growth Curve flattens indicates a decrease in the rate of Return On Investment (ROI). The decrease is a result of the fact that the primary failure modes that have most impacted Reliability have been eliminated or mitigated. The elimination of other, less prominent failure modes have a decreased impact on improving Reliability. This is where the cost of continuing investment results in decreased returns.

"Where and why should you stop?" The stopping point is reached as a trade-off decision between the cost/time of test resources and the incremental improvement in Reliability. AMSAA models can be used to evaluate the ROI associated with eliminating failure modes; this is part of the Reliability growth plan decision process.

DTM 11-003 DoDI 5000.02, Enclosure 3, Systems Engineering, paragraph 12 states that Reliability Growth Curves (RGC), included in the SEP at MS A, and updated in the TEMP beginning at MS B, "will be stated in a series of intermediate goals and tracked through fully integrated, system-level test and evaluation events until the Reliability threshold is achieved. If a single curve is not adequate to describe overall system Reliability, curves will be provided for critical subsystems with rationale for their selection."Comment by PDallosta: Updated 5000.02

A note of caution from the Defense Acquisition Guidebook (DAG) paragraph 5.3.2: "Emphasis should be placed on discovering and mitigating failure modes throughout the system design and development process, since relying solely on testing as a means of improving Reliability has been shown to be risky and costly. Consideration should be given to using such practices as physics of failure reviews, environmental stress screening, and highly accelerated life testing. A test analysis and fix program should be implemented to increase Reliability and it should be expanded as more of the hardware (including prototypes) is tested and operated by the users. Using this process, failure modes may be found through analysis and testing are then eliminated or reduced by design or process changes as appropriate. Shortchanging this effort early in development, particularly at the subsystem and component level, is a frequent cause of later program delays and cost increases as the flaws inevitably show up in system level performance."

Content

Slide 5-23. Iterate Plan

Reliability Growth Management procedures are useful for determining priorities and allocating resources. It is a process associated with planning for Reliability achievement as a function of time and other resources, and controlling the ongoing rate of achievement by reallocation of resources based on comparisons between planned and assessed Reliability values.

As noted in MIL-HDBK-189C, Section 4, "The Reliability Growth Management techniques will enable the manager to plan, evaluate, and control the Reliability of a system during its development stage. Reliability Growth Planning addresses program schedules, amount of testing, resources available, and the realism of the test program in achieving the requirements. The planning is quantified and reflected in the construction of a Reliability Growth Curve and the necessary supporting Reliability activities.

Once the Reliability Growth Plan is initiated, it will need to be consistently updated, not only throughout the R&M Analysis, but also through subsequent Supportability analyses, to ensure traceability to the Sustainment KPP/KSA and Affordability requirements.

Two of the key constants in the R&M Life Cycle are Reliability Growth and Failure Reporting Action and Corrective Action System (FRACAS). The rate of improvement in Reliability is determined by the:

1. On-going rate at which new failure modes are identified

2. Set of failure modes that are addressed by corrective actions

3. Effectiveness and timeliness of the corrective actions

Models for planning and tracking growth testing include those shown in the following table.

Reliability & Maintainability Models and Techniques* Table

Term

Definition

Allocation

− Equal apportionment model

To distribute a system requirement throughout the system (allocations starts with the system requirement and you apportion down, level by level, until you reach the part level in a system)

Prediction

To estimate the Reliability of a system (prediction starts at the lowest level of the system for which you have data and you compute probabilities until you reach the system level)

Reliability Development Growth Testing (RDGT) via Test, Analyze and Fix (TAAF)

− Duane model

To raise the Mean Time between Failures (MTBF) up to an acceptable level

Reliability Qualification Test (RQT)

To demonstrate, during the Development Phase, that the system Reliability meets the system requirement

Production Reliability Acceptance Test (PRAT)

To demonstrate that production units meet the system Reliability requirement

Environmental Stress Screening (ESS)

To find units that have not been manufactured correctly so they may be fixed before they are tested and/or sent to the customer

*Reliability, Maintainability, and Availability for Engineers Text Book. DAU Mid-West Region, 1 May 2008, p. 18

Other models are discussed in the US Army Materiel Systems Analysis Activity (AMSAA) Reliability Growth Guide. AMSAA developed an Excel-based Reliability Program Scorecard tool to standardize the assessment of a program’s path to meeting its Reliability requirements. These models use a relationship between cumulative test time and cumulative failures to develop a Reliability Growth Profile that forms the basis of the Reliability Growth Test Program.

Content

Slide 5-24. Perform Research

Step 2 of Set Up is to Perform Research. This research includes a discovery phase where similar systems are identified and evaluated as candidates to consider for Reliability Prediction and Allocation purposes. These findings are shared within IPTs to facilitate data sharing and RAM-C coordination activities.

The IPTs will validate similar system sources for accuracy, currency and applicability.

Content

SAE GEIA-STD-0007

Slide 5-25. Find Similar System Date

Through networking within your organization as well as with other Government and Industry organizations (Market Research) you may find additional sources of historical data and current technical data on similar systems.

When finding similar system technical data, it is a good idea to find out:

Lessons learned from that project

What tools were used and with what success

If the system in acquisition shares similar components with other acquisition programs, those similar product structures and related Reliability attributes may be brought in directly to the Logistics Product Database through a standards based data transfer. Similarly, Reliability Block Diagrams can be reused for those subsystems that are already under development or deployed via other programs. Having standards based tool sets facilitates data transfer irrespective of tool choices between weapon system programs.

Shown here are RBD, product structure output and Supportability Analysis tool (A and X tables) where the product structure and Reliability attributes were imported into the Logistics Product Database from another weapon systems program data files.

Content

Slide 526. Find R&M Tools

You want to find toolsets that will reduce as many interface errors as possible. Some key questions to ask about the tools you are researching and eventually propose to the IPT include:

Is this a standard based tool?

Does this tool leverage data?

Does the toolset interface with the SAE GEIA-STD-0007 standards?

Can this tool update your project’s Logistics Product Database easily?

A listing of Product Supportability Analysis tools may be found at https://acc.dau.mil/psa-tools.

Content

SAE GEIA-STD-0007

Slide 527. Define R&M Data Inputs

Step 3 of the Set Up requires defining R&M data inputs and loading the technical data. Table B in the Logistics Product Database is where most Reliability attributes are located to conduct many R&M Analysis. Defining a data management strategy will help you pick the right tools, leverage standards, and identify the existing data required from the toolset. The strategy begins with defining the scope of the data and a plan for managing that data.

The initial data inputs for R&M analyses are top tier (KPP/KSAs) R&M requirements.

In addition, you may have any combination of the following sources depending on whether you are beginning a new acquisition or working with a fielded system:

Test data

Developmental Test & Evaluation (DT&E)

Operational Test & Evaluation (OT&E)

· Original Equipment Manufacturer (OEM) data FRACAS/Field data

Similar systems

Surrogate sources (e. g., MIL-HDBK-217 provides failure rate models based on the best field data that is then analyzed and massaged, with many simplifying assumptions thrown in, to create usable models.)

Content

Slide 528. Collect/Load Tech Data

Once you have identified additional data sources, you must pull that data into R&M models through standard data exchange to populate the structure into appropriate R&M models and tools.

Your Data Management Strategy will:

Set up the data Order of Precedence that is necessary to complete the allocations

Enable modeling the Reliability Block Diagrams and making predictions

Again, as the system matures, the data will mature. The initial data may be overwritten by increasingly accurate and realistic test and field data.

The order of precedence in using data for modeling is:

Field data

Test data

Similarity data

Prediction data

Content

Slide 5-29. R&M: Modeling Order of Precedence

The order of precedence in using data for modeling is:

Field data

Test data

Similarity data

Prediction data

As data mature through test events and post fielding data analysis, the Reliability Block Diagram and Logistics Product Database B (RAM) Tables are updated.

Updates between engineering models and the Logistics Product Database and Supportability Analyses must be synchronized. Reliability changes are evaluated through these analyses and updated through the IPTs to evaluate compliance with user requirements.

R&M Process Activities

Content

Slide 530. Topic 4: R&M Process Activities

Content

Slide 5-31. R&M Activities

Some Key Questions to ask in this R&M Activities topic are:

What R&M activities are conducted as part of the Supportability Analysis process?

How are these activities implemented?

What are the outcomes from these R&M activities that impact , and enable Supportability

R&M activities include three main areas allocation, modeling, and prediction. Iteratively following each activity is an analysis on the impact to Supportability.

Step 2.1 R&M Allocation identifies component Reliability requirements and establishes initial margins for trade-offs. Allocation is a process of deriving requirements for lower indentures of assembly from system-level requirements

Step 2.2 Modeling is a representation of a system’s physical and functional baselines that enables what-if analysis and helps to prioritize R&M design opportunities. R&M Modeling graphically, pictorially, or mathematically describes the Reliability characteristics of a part, subassembly, or assembly.

Step 2.3 Prediction is the process of quantitatively assessing an equipment design relative to its specified Reliability and Maintainability. Prediction forms the initial R&M numeric baseline. Later, this baseline is updated through Test & Evaluation and Post Fielding data feedback.

Slide 532. R&M Process Map

This R&M roadmap breaks the Analyze phase into activity sub-steps and shows the iterative nature of the analysis.

R&M Analysis compares allocations, models, and predictions to the user requirements defined in the CDD. As data are refined through various Supportability analyses T&E events and Post-Fielding performance updates, the Reliability model must be updated and new calculations performed. In turn, these results must be compared with the allocations based on the user requirements.

Content

Slide 533. R&M Allocation: Overview

R&M Allocation is a disciplined approach to identifying Reliability requirements through the levels of indenture and to establish margins for tradeoffs. R&M Allocation develops lower-level Reliability, Maintainability, and Availability requirements that will satisfy system requirements. The Reliability Allocation process distributes mission failures among the assemblies and to lower levels of indenture. The goals are to predict the rate at which subsystems and components are allowed to fail to achieve the allocated reliability. As test events reveal Reliability issues, these allocations and Reliability Growth opportunities are revisited and potential trade-offs evaluated.

R&M allocation process is an iterative process that:

Focuses the design effort on both failure (Reliability) and restoral (Maintainability)

Assesses the impact on systems level R&M given the relative complexities of the assemblies in terms of both Reliability and Maintainability characteristics

Translates and allocates system requirements into lower tier design criteria

In the R&M allocation process, each assembly is described in terms of its Failure Rate (FR), and its reciprocal, Mean Time between Failure (MTBF) and Mean Time To Repair (MTTR) (2012 Annual RELIABILITY and MAINTAINABILITY Symposium. "Supportability Analysis,"

Content

Slide 5-34. Reliability Allocation: Calculations

As the RAM-C Manual instructs (paragraph 3.2.4.2), you will need to “express the Reliability requirement for the capability as a desired failure-free interval that is then used in the development of an MTBF for the Reliability KSA. For the failure-free interval to be valid, there must be an associated probability of achieving the stated value (given that a 100 percent chance of achieving a Reliability value requires a failure rate of zero).”

In this allocation example, you will use Reliability field data from a similar car, Car A, to derive the Reliability of the new car's drive train subsystem. Next, using Car A's Mean Time between Failure (MTBF) field data, you can calculate the new car drive train subsystem failure rate.

Notice that the failure rate is the inverse of MTBF, otherwise stated as 1 divided by MTBF.

Content

Slide 535. R&M Allocation: Calculations

Step 3 is to allocate failures to subsystems and lower levels of indenture. This step has four sub-steps that are labeled A, B, C, and D in the following four slides.

In this third step of the allocation process, you will again rely upon field data from the similar Car A. In Car A's drive train subsystem, there are four components, or Line Replaceable Units (LRUs), including the transmission, front axle, rear axle, and drive shaft.

Each of these LRUs contributes to mission failure and is expressed as a percent. In this case, the transmission accounts for 10% of failures, the front axle accounts for 38%, the rear axle accounts for 34%, and the drive shaft accounts for 18% of drive train failures.

Content

Slide 536. Allocation: % Mission Failure

In step 3A, you will use the similar Car A's mission failure proportion values for the new car. In turn, these values will allow you to calculate the relative failure rate, then the allocated failure rate, and the Mean Time between Critical Failure (MTBCF) for each LRU in the new car drive train subsystem.

Notice that the sum of mission failures for the subsystem must equal 100%.

Content

Slide 537. Allocation – Relative Failure Rate

Once the % of mission failure is assigned to each subsystem, or in this example to each LRU level of indenture, Step 3B requires you to calculate the Relative Failure Rate. Each component's % of mission failure contribution is divided by the lowest % mission failure for the specific level of indenture. In the case of the drive train subsystem, the transmission has the lowest % of mission failure at 10%.

The relative failure rate for each of the four LRUs follows:

Drive Train Subsystem LRUs

Relative Failure Rates

Transmission

= 1.0

Front Axle

= 3.8

Rear Axle

= 3.4

Drive Shaft

= 1.8

Notice that the sum of the relative failure rates is 10. You will need this value for the next step.

Content

Slide 538. Allocation – Allocated Failure Rates

Step 3C requires you to calculate the allocated failure rate for each LRU in this new car drive train example. Recall from slide 34 during the discussion of Step 2 that the subsystem failure rate is the inverse of Car A's MTBF.

To calculate the sum of the system failure rate, you will divide the subsystem failure rate (1 divided by 10,791) by the sum of the relative failure rates derived in Step 3B, that is 10.

Now you can calculate the allocated failure rate by multiplying the relative failure rate times the sum of the system failure rates.

Notice that the sum of the allocated failure rates for all four LRUs is equal to the subsystem failure rate (1 divided by MTBF).

Content

Slide 539. Allocation – MTBCF/MTBF

Recall that the Reliability KSA measures the probability that the system will perform without failure over a specified interval under specific conditions. It is usually referred to as Mean Time between Failure (MTBF), which is expressed as total operating hours/total number of failures and is a contractual requirement.

Earlier in this lesson, you learned that the data would be refined as Supportability Analysis progresses. Since the example in this lesson is a new car, you have relied on established field data for the similar Car A and used Car A's MTBF as the basis for calculating the allocated failure rate.

Now to complete the Reliability allocation, you will calculate the Mean Time between Critical Failure (MTBCF) for the new car. MTBCF is an estimated design metric and sometimes differs from the Mean Time between Failures (MTBF) calculated in the RCM Analysis (Lesson 8 in this course). MTBF is the more accurate metric.

Step 3D in the Allocation process for the new car is to calculate the Mean Time between Critical Failure (MTBCF) for the four LRUs of the drive train subsystem. To do this you will divide one by the allocated failure rate for the LRU.

For repairable items, the exponential distribution often describes the pattern of failures over time. This distribution has a constant failure rate. The failure rate is the number of failures per unit time.

Content

Slide 540. Maintainability Allocation

The objective of designing for Maintainability is to design systems that are easy to service, repair, and maintain, to the extent that is economically and technically feasible and necessary to accomplish the mission at an affordable cost. The intent of Maintainability is to reduce the time it takes for a properly trained maintainer to isolate the failure and fix it.

Intrinsic factors contributing to Maintainability are: modularity, interoperability, diagnostics, prognostics, fail safe, and access. Setting good requirements for Maintainability requires an understanding of the user needs.

Maintainability requirements are needed at:

System level: From the user needs, contractual system-level Maintainability requirements are established that, if met, will ensure that the operational Maintainability of the product will meet the user’s mission needs and be consistent with O&S cost constraints.

Lower levels of design: Contractors allocate the system-level requirements down to the level needed to be meaningful to the design and manufacturing process engineers; this level may be subsystem, component, or even lower.

System-level requirements are insufficient to support the design effort. For example, a requirement that a truck have a Mean Time to Repair (MTTR) of 2.1 hours doesn’t help the designers of the transmission, engine, and other components. Consequently, the requirement process involves allocating the Maintainability requirements to lower levels.

In some cases, the process is iterative, requiring several attempts to satisfy all requirements. In other cases, the requirements cannot be satisfied (to meet the system-level requirement, components are needed with higher-than-possible levels of Maintainability). In these cases, dialogue and trade-offs with the user are required.

It is important to note that planning and implementing testability and diagnostics early in design will minimize the cost and increase the Maintainability.

The allocation of Maintainability basically "weights" the value of Maintainability allocated to a given component/item by its Reliability.

The more often an item fails, the quicker you would want to repair/replace it.

The less frequently it fails, the longer you could take and still meet Availability requirements.

For logisticians, R&M allocation values help determine:

How many spares we need to buy

How many maintainers we need

How many trucks we need to carry supporting material

How many aircraft we need in order to deploy the entire system and support infrastructure

Content

Slide 5-41. Maintainability Allocation

To allocate Maintainability to each of a system’s components, one of several approaches could be used.

If the relative failure rate of each component is known, the Maintainability requirement can be allocated on that basis.

Higher failure rate components would be allocated higher Maintainability requirements – lower repair time

If the relative mission criticality of each component is known, the Maintainability requirements can be allocated on that basis.

Higher criticality components are allocated higher Maintainability requirements – lower repair time.

There are many different Maintainability metrics and they may be established at various maintenance levels, such as:

Mean Time to Repair (MTTR)

Corrective maintenance time (MCT)

Total labor hours

Mean man-hours per repair

Inspection time

Probability of fault detection

Proportion of faults isolatable

False alarm rate

Failure Mode Effects and Criticality Analysis (FMECA) and Fault Tree Analysis (FTA) are critical for effective system design that meets Reliability, Maintainability, and performance requirements. Both analyses identify system failures and causes as well as recommended mitigation strategies to reduce the risk of failure. FMECA and FTA evaluate the system against Reliability and safety requirements to identify design weakness. Once root causes of system faults are identified, opportunities for failure detection emerge. In turn, these analyses' outcomes inform design changes.

"FMECA is a systematic approach to identifying potential system failure modes, their causes, and the effect of the failure mode occurrence on the system operation. It is discussed in greater detail in Lesson 6 along with Fault Tree Analysis (FTA). FTA further investigates the most significant failures identified in FMECA and models alternative paths examining all the relevant events that cause failures to reduce probability of occurrence or severity of the failure. To be most effective, FTA would include all potential causes of failure mode (design, manufacturing, maintenance, use) and then assess the cause and probability of occurrence. It combines product and process FMEA analyses in order to identify corrective actions, which may include changes in requirements, design, processes, procedures, or materials to eliminate design deficiency." (2012 Annual RELIABILITY and MAINTAINABILITY Symposium, "Fundamentals of Failure Modes and Effects Analysis," John Bowles PhD). So FMECA and FTA work together to identify and mitigate risks to mission failure.

System-level Maintainability is a function of both the failure rate and the restoral rate. As such, the system level MTTR is the weighted average of the sums of the mathematical products of the failure rates (FR) AND the restoral rates (MTTR).

When one or more design teams cannot meet their allocated goals, one of two actions must be taken: (1) If other teams are exceeding their requirement (i.e., either achieving better Reliability or Maintainability than needed), their allocation can be made more stringent and the resulting "savings" redistributed to the teams having more of a challenge. (2) If other teams are not exceeding their requirements, or the "savings" from teams that are exceeding their requirements are insufficient to offset the shortfalls, then the higher-level requirements may need to be changed.

Content

Slide 542. Modeling: Reliability Block Diagram

The next step in the Analyze Phase is Modeling. Reliability modeling is the process of predicting or understanding the Reliability of a component or system prior to its implementation. Two types of quantitative analysis that are often used to model a system Reliability behavior are Reliability Block Diagrams (RBDs) and Fault Tree Analysis (FTA). The RBD and fault tree modeling provide a graphical means of evaluating the relationships between different parts of the system. These models may incorporate predictions based on failure rates taken from historical data. While the (input data) predictions are often not accurate in an absolute sense, they are valuable to assess relative differences in design alternatives. In turn, this informs trade-off decision making.

The R&M model describes the Reliability characteristics of a part, subassembly, or assembly. This model description may be depicted graphically, pictorially, or mathematically. The goals of R&M Modeling include:

Constructing a system model by drawing components in configurations to predict Reliability, Maintainability and Availability

Graphical analysis

Comparing alternative design options

Simplifying complex systems.

Content

Slide 543. R&M: Failure Modeling

In Step 2 of failure modeling, the goal is to calculate the number of anticipated failures over the system's life. As with allocation, the level of indenture to which this modeling takes place is dependent on the stated requirements and complexity of the system. In this lesson's example of the new car's drive train subsystem, during allocation you calculated the four LRU failure rates as shown above the components in the RBD.

The goal is now to calculate the number failures anticipated over the life of the fleet of cars. You will examine this modeling in the context of the drive train subsystem.

The total span in miles of the 100-car fleet failures over five years is calculated by multiplying 100 cars times 15,000 miles per car times 5 years. The total span in miles of failures for the system is 7,500,000 miles. To calculate the total fleet drive train failures, multiply 7,500,000 times the sum of the LRU failure rates. This results in 695 fleet drive train subsystem failures per mile over 5 years.

A similar calculation can be performed for each LRU of the subsystem. For example, the total fleet transmission failures per mile would be calculated by multiplying 7,500,000 miles times the transmission failure rate of 0.00000926698. This results in 69.5 transmission failures per mile over 5 years, which is 10% as you originally allocated based on Car A's MTBF.

Content

Slide 544. R&M: Downtime Modeling

Step 3 of the Reliability modeling has two sub-steps. First, you will calculate operational system downtime. Then, you will calculate Operational Availability (Ao).

To derive the operation system downtime, you will perform four calculations.

First, find the total number of failures for the fleet of 100 cars over the 5 years when 10% of the failures must be serviced by the OEM and 90% we'll call local service repair. You must factor into the equation the Mean Miles between Maintenance (MMBM) of 5,000 miles with an expected use of 75,000 miles per car. The total number of failures that must be repaired equals 1,500.

Second, calculate the fleet wide number of downtime hours for failures over the 5 years when each failure will result in an average downtime of 24 hours. This results in 52,680 hours.

Third, calculate the preventive maintenance (PM) downtime for the fleet over 5 years, given each year, preventive maintenance results in 48 hours of downtime per car.

Fourth, add the total preventive maintenance downtime to the downtime for failures to derive the total downtime. The result is 76,680 hours for the fleet of 100 cars over 5 years.

Content

Slide 545. R&M: Downtime & AO Modeling

The goal of Operational Availability Modeling is the calculation of a system’s Operational Availability (AO).

In a perfect world, the cars would be available 365 days per year for 12 hours per day. To solve for AO, you must first calculate total possible uptime for the fleet of 100 cars over 5 years for 12 hours per day, 365 days per year. This computes to 2,190,000 hours.

Next, you must calculate uptime. Uptime is equal to total time minus operational downtime that you calculated on the previous slide.

Finally, to calculate AO you will divide uptime by the sum of uptime plus downtime. Operational Availability is 96.4%.

Content

Slide 546. RAM-C Analysis Summary

The analytical steps lead to addressing three of four Life Cycle Sustainment outcomes.

Allocation, modeling, and prediction validate AO.

Content

Slide 547. R&M: Modeling Order of Precedence

There is an order of precedence in using data for modeling as follows:

1. Field data

2. Test data

3. Similarity data

4. Prediction data (e.g., MIL-HDBK-217)

As data are updated, Reliability information is re-evaluated based on the order of precedence. Fielded system data always takes precedence.

Content

Slide 548. Maintainability Modeling

Just as the Reliability Block Diagram graphically illustrates relationships for Reliability, the fault tree models Maintainability. The fault tree shows the relationships between top-level events and lower-level contributors. The fault tree uses Reliability data and results of critical failures from FMECA to provide the probability and frequency of events.

You will learn more about this modeling technique in Lesson 6.

Content

Slide 549. R&M Prediction

R&M prediction is the process of quantitatively assessing an equipment design relative to its specified Reliability. Prediction refers to assessments made using quantitative mathematical models. Reliability prediction is conducted in a ‘bottoms-up” process that builds on the predicted failure rate of the lowest level of indenture, (i.e., a part), and cumulatively adds failure rates of subassemblies and assemblies to provide a system level failure rate.

Goal: The goal of R&M prediction is to improve operational Suitability by using incidents to identify problems and estimate Reliability and opportunities for Reliability Growth / improvement. As Reliability is updated, new values must be validated in models and compared to allocations.

Reliability prediction is conducted to:

Evaluate the feasibility of a design with regard to its Reliability requirement

Compare competing designs

Provide input to support other design activities, such as Failure Mode, Effects and Criticality Analysis (FMECA )

Forecast logistics needs (spares, maintenance labor, test equipment, etc.)

Prediction methodologies include:

Comparative Analysis, which predicts new design by comparing to known design

Translators, which convert empirical data into reliability value

Parts Count method, which identifies part classes/failure rates/quantity

Stress Analysis method, which predicts parts failure rate based on use/environment

The Reliability “bathtub curve” models the cradle to grave instantaneous failure rates over time. The curve is modeled mathematically by exponential functions. The bathtub curve is generated by mapping the rate of early "infant mortality" failures when first introduced, the rate of random failures with constant failure rate during its "useful life", and finally the rate of "wear out" failures as the product exceeds its design lifetime.

In the early life of a product adhering to the bathtub curve, the failure rate is high but rapidly decreasing as defective products are identified and discarded, and early sources of potential failure such as handling and installation error are surmounted. If you follow the slope from the start to where it begins to flatten out this can be considered the first period, also called infant mortality period.

The next period is the flat portion of the graph, called the normal life. Failures occur more in a random sequence during this time. It is difficult to predict which failure mode will manifest, but the rate of failures is predictable. Notice the constant slope.

The third period begins at the point where the slope begins to increase and extends to the end of the graph. This is what happens when units become old and begin to fail at an increasing rate. (Retrieved from http://en.wikipedia.org/wiki/Bathtub_curve, June 13, 2012)

The exponential distribution is widely used in the analysis of failure rates of complete systems or assemblies. The distribution mathematically describes the upward and downward sloping curves of the bathtub with the uniform distribution describing the relatively flat middle portion.

The point of the bathtub curve graph is that failure characteristics incorporate three distinct failure distributions:

Useful Life is called a 'uniform distribution'

Infant Mortality is 'negative exponential distribution'

Wear Out is a 'positive exponential distribution'

As explained by Dennis J. Wilkins, "The bathtub curve, does not depict the failure rate of a single item, but describes the relative failure rate of an entire population of products over time. Some individual units will fail relatively early (infant mortality failures), others (we hope most) will last until wear-out, and some will fail during the relatively long period typically called normal life. Failures during infant mortality are highly undesirable and are always caused by defects and blunders: material defects, design blunders, errors in assembly, etc. Normal life failures are normally considered to be random cases of "stress exceeding strength." However, as we'll see, many failures often considered normal life failures are actually infant mortality failures. Wear-out is a fact of life due to fatigue or depletion of materials (such as lubrication depletion in bearings). A product's useful life is limited by its shortest-lived component. A product manufacturer must assure that all specified materials are adequate to function through the intended product life." (Retrieved from http://www.weibull.com/hotwire/ issue21/hottopics21.htm, June 13, 2012)

As presented by Andre V. Kleyner, "Reliability analysis has a high element of uncertainty because product reliability is not a physical property, such as weight, volume or current, which can be measured directly. There are many models which can provide data on product life under given conditions, but there is no one model that can exactly predict a product life for all failure modes … Engineers should not become fixated on one model or method and should instead try to find the right tool(s) for the job. Finite element analysis, stepped overstress testing, Monte-Carlo simulation, stress strength calculations, analysis of similar product warranties are all viable methods for predicting reliability. Reliability demonstration testing is just one tool among many in the large toolbox available to engineers and test development professionals." (2012 Annual RELIABILITY and MAINTAINABILITY Symposium, "Reliability Demonstration: Theory and Application," Andre V. Kleyner, Ph.D.)

In Reliability Engineering, the cumulative distribution function corresponding to a bathtub curve may be analyzed using a Weibull chart. This lesson will focus on the use of the Weibull Analysis for Reliability and Maintainability prediction.

Regarding levels of risk, and recognizing that not all predictions are based on empirical data, the following table provides a comparison of the relative risks associated with various prediction approaches. (Denson, W.K., “Reliability Modeling: The RIAC Guide to Reliability Prediction, Assessment and Estimation”, Reliability Information Analysis Center, May 2010)

Relative Risk of Mismatches between Prediction and Actual End-Use Environments Table

Low Risk

Medium Risk

High Risk

Predicted reliability is based on:

Historical reliability of items of same complexity/same environments/ same technology

Historical reliability of items of similar complexity/ similar environments/ similar technology

Test data of comparable items under different end-use conditions

Predicted reliability is based on:

Reliability analysis of identical/comparable items under expected end-use conditions

Reliability analysis of comparable items under different end-use conditions

Handbook/ Databook numbers supported by documented assumptions and rationale

Predicted reliability is based on:

Test data of comparable items under expected end-use conditions

SME engineering judgment supported by documented assumptions and rationale

Predicted reliability is based on:

Reliability analysis or test data of state-of the-art technology

SME engineering opinion

Content

Slide 550. Reliability Prediction

The primary advantage of Weibull Analysis is the ability to provide reasonably accurate failure analysis and failure forecasts with extremely small samples. Solutions are possible at the earliest indications of a problem. Weibull analysis includes:

Plotting the data and interpreting the plot

Failure forecasting and prediction

Evaluating corrective action plans

Engineering change substantiation

Maintenance planning and cost effective replacement strategies

Spare parts forecasting

Warranty analysis and support cost predictions

Calibration of complex design systems, i.e., CAD\CAM, finite element analysis, etc.

Recommendations to management in response to service problems

Using statistical methods for fitting data, Weibull Analysis is able to determine the most likely underlying probab