uncertainty analysis and the project cost … solutions mipss team abstract uncertainty and risk...

14
NASA MSFC Vic tory Solutions SB Diversity MIPSS Team YOUR COMPANY LOGO HERE Uncertainty Analysis and the Project Cost Estimating Capability NASA Cost Symposium August 2014 Andy Prince MSFC/Engineering Cost Office Brian Alford Victory Solutions MIPSS Team/Booz Allen Hamilton Blake Boswell Victory Solutions MIPSS Team/Booz Allen Hamilton Matt Pitlyk Victory Solutions MIPSS Team/Booz Allen Hamilton

Upload: haphuc

Post on 21-May-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

NA

SA M

SFC

Vict

ory

Sol

utio

ns

SB Diversity

MIPSS

Team

YOUR COMPANY LOGO

HERE

Uncertainty Analysis and the Project Cost

Estimating Capability

NASA Cost Symposium

August 2014

Andy Prince – MSFC/Engineering Cost Office

Brian Alford – Victory Solutions MIPSS Team/Booz Allen Hamilton

Blake Boswell – Victory Solutions MIPSS Team/Booz Allen Hamilton

Matt Pitlyk – Victory Solutions MIPSS Team/Booz Allen Hamilton

Victory Solutions MIPSS Team

Abstract

Uncertainty and Risk Analysis is an increasingly importantpart of cost estimating. In addition to the increased demandfrom decision makers to see this type of analysis, NASApolicy often requires certain programs to report a confidencelevel along with the cost estimate or a range of costs "with aconfidence level established by a probabilistic analysis". Thispresentation will cover how to incorporate uncertainty in anestimating model. This includes calculations of predictionintervals around a regression based Cost EstimatingRelationship. It will compare the effects of using just inputuncertainty, just model uncertainty, and both input and modeluncertainty on the final results of the model. Finally, there willbe a demonstration of how this math has been implementedand automated in the Project Cost Estimating Capability(PCEC) and how to add uncertainty to a PCEC model.

Victory Solutions MIPSS Team

Contents

• Input Uncertainty

• Model Uncertainty

– What is Model Uncertainty

– Regression Overview

– Confidence vs Prediction Intervals

– Calculating Intervals

• Examples of Including Uncertainty

• Uncertainty in PCEC Demo

Victory Solutions MIPSS Team

Input Uncertainty

Input Uncertainty is variation around the inputs to a model

• Type that most people are familiar with and many include in their models

• Important because the final values for many parameters are not known at

the time of estimating

• Sources include:– Uncertainty around the final design

– Variation in historical data

– Changing requirements

– Uncertainty of who will perform the work

– Variation in price of materials

– Weather

Victory Solutions MIPSS Team

Input Uncertainty is Not Enough

for Regression Equations

Model Uncertainty is the uncertainty around the estimate produced by a

regression equation

• Holding the input value steady will still produce an output distribution because of the

modeling error

• The total uncertainty of the model is a combination of two types of variation: input and

model

Non-linear regression example adapted from 2008 NASA Cost Estimating Handbook

Input

Uncertainty

Combined Input

and Model

Uncertainty

Victory Solutions MIPSS Team

Regression Overview

Ordinary Least Squares regression finds the line through the data that

minimizes the sum squared error. This produces a mean estimator.

y = 0.7836x + 0R² = 0.946

0

2

4

6

8

0

0 2 4 6 8

.8138 9

10

We use information from the regression analysis to produce

confidence and prediction intervals which help us estimate.

1

Victory Solutions MIPSS Team

Confidence vs Prediction intervals

Confidence and Prediction intervals are related but provide bounds for different

types of estimation.

• Confidence intervals give bounds for estimating the true value of the regression line

(the true mean of Y) at a particular x value (or vector if there are multiple independent

variables).

• Prediction intervals give bounds for estimating another observation (particular value

of Y) at a particular x value (or vector if there are multiple independent variables).

-1

1

3

5

7

9

-2 0 2 4 6 8 10

Co

st

Cost Driver

Confidence & Prediction Intervals

Regression

Data

CI Low

CI High

PI Low

PI High

Prediction intervals include two types of variation: the error around estimating the true mean and the

natural variation around the true mean.

Because the estimate of the mean necessarily has less variation than the estimate of observations

that produce that mean, confidence intervals are smaller than prediction intervals.

Victory Solutions MIPSS Team

Why are Prediction Intervals Bigger?

Confidence intervals: Here were are trying to predict the mean y value and so are only concerned about the variation around the regression line estimate.

Mean estimate = 𝑦 0|𝑥0

𝑣𝑎𝑟 𝑦 0 𝑥0 = 𝑣𝑎𝑟 𝑥0𝛽 𝑥0

= 𝑥0 𝑣𝑎𝑟 𝛽 𝑥0’

= 𝑥0 𝜎 2(𝑋′𝑋)−1𝑥0’

= 𝜎 2 𝑥0 (𝑋′𝑋)−1𝑥0’Where 𝛽 is the estimated regression coefficients and 𝑋 is the design matrix.

Prediction intervals:

Here we are trying to predict a new y value (not the mean of the y’s) and so are also concerned with the variation of the y values themselves, not just around the mean. To account for this, we add another term.

New observation estimate = 0 0 0 0

𝑦 0 = 𝑦 0|𝑥0 + 𝜀0

𝑣𝑎𝑟 𝑦 0 𝑥0 = 𝑣𝑎𝑟(𝑦 0|𝑥0 + 𝜀0)

= 𝑥0 𝑣𝑎𝑟 𝛽 𝑥0′ + 𝑣𝑎𝑟(𝜀0|𝑥0) + 2cov(𝑥0𝛽 , 𝜀0|𝑥0)

(no covariance because of OLS assumptions)

= 𝑥0 𝜎 2(𝑋′𝑋)−1𝑥0′ + 𝜎 2

= 𝜎 2 [𝑥 (𝑋′𝑋)−1𝑥′ + 1]

𝑦 = 𝑦 |𝑥 + 𝜀

0 0The inclusion of the additional term results in a “+1” in the final step which makes the prediction intervals larger.

Victory Solutions MIPSS Team

Calculating Intervals

Form of an interval:

Output from regression equation ± critical value from t distribution × Standard Error

Ordinary Least Squared dictates that we use a t-distribution because • We assume normally distributed data around means

• Do not know the actual variance of the residuals (sigma^2) and so must use an estimate (SEE)

The difference between confidence and prediction intervals is in the Standard Error term whichwe have already seen. So

C.I. = 𝑦 0|𝑥0 ± 𝑡 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙, 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑑𝑒𝑔𝑟𝑒𝑠𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 ∗ 𝑆𝐸𝐸 ∗ 𝑥0 (𝑋′𝑋)−1𝑥0’

P.I. = 𝑦 0|𝑥0 ± 𝑡 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙, 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑑𝑒𝑔𝑟𝑒𝑠𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 ∗ 𝑆𝐸𝐸 ∗ 1 + 𝑥0 (𝑋′𝑋)−1𝑥0’

SEE (Standard Error of the Estimate) comes from our regression analysis

and is our estimate for sigma.

Note: These formulas are more common written as:

C.I. = 𝑦 |𝑥0 ± 𝑡𝛼

2,𝑑𝑓 × 𝑆𝐸𝐸

1

𝑛+

(𝑥0−𝑋 )2

𝑋2−𝑛𝑋 2

P.I. = 𝑦 |𝑥0 ± 𝑡𝛼

2,𝑑𝑓 × 𝑆𝐸𝐸 1 +

1

𝑛+

(𝑥0−𝑋 )2

𝑋2−𝑛𝑋 2

However, these formulas do not extend to the case with multiple independent variables. Thus matrix algebra (and notation) is used.

Victory Solutions MIPSS Team

Producing and Using Prediction Intervals

Now that we know what predictions intervals tell us and how they are different than confidence intervals, how do we use them? The use we are concerned with is producing estimated y values for a Monte Carlo simulation. Because Monte Carlo simulations rely on the range of future observations rather than the mean of the observations, prediction intervals are used to randomly sample from the distribution of future observations.

1. Calculate the output of the regression equation for the x value(s). This is the mean estimate.

2. Randomly sample from a t distribution with degrees of freedom equal to the residual degrees of freedomfrom the regression analysis.

3. Calculate the Standard Error for the regression equation at the given input vector.

𝑠𝑒 𝑦 𝑥 = 𝑆𝐸𝐸 1 + 𝑥 𝑋′𝑋 −10 0 𝑥0′

where SEE = the Standard Error of the Estimate for the regression analysis and X is the design matrix. The design matrix is of size [(number of data points) x (number of variables + 1)] where the first column is all 1’s, the second column is the data points for the first variable in the regression, the third column is the data points for the second variable in the regression, etc.

Now we have all the part for the expression:

Output from regression equation + critical value from t distribution × Standard Error

This will produce a single potential future observation of y either above or below the estimated mean (because the sample from the t distribution may be positive or negative) which can then be used in one trial of a Monte Carlo simulation.

Notes: The PCEC refers to (𝑋′𝑋) as the “squared design matrix” and includes this with CERs for users to use for including model uncertainty in estimates.

The PCEC includes Prediction Interval calculations for appropriate CERs. Users can use any Monte Carlo addin with the PCEC to add uncertainty to their models.

Victory Solutions MIPSS Team

Why Are Prediction Intervals curved?

Prediction (and confidence) intervals are narrowest near the point (𝑥, 𝑦) and get wider as the x

value(s) move away. To understand why, it is easier to refer again to the PI equation for a univariate

regression:

1 𝑥 −𝑋 2

P.I. = 𝑦 |𝑥0 ± 𝑡 × 𝑆𝐸𝐸 1 + + 0𝛼

,𝑑𝑓 𝑛 𝑋2−𝑛𝑋 22

Looking at this expression we can make a few observations:

• When the input value equals the mean, 𝑥0 = 𝑋 , the third term under the radical is 0 and the

standard error is at its smallest.

• When 𝑥0 ≠ 𝑋 , the third term under the radical is positive. Moreover, the standard error grows

larger as 𝑥0 gets farther from 𝑋 . This happens in either direction because the term is squared and

thus always positive.

• The regression equation is most precise near the center of the data set. Estimating outside the

range of the data carries large uncertainty because the standard error grows large.

• The standard error must be calculated for every input value. During simulations where 𝑥0 is varied

(input uncertainty), the standard error must be recalculated for each trial.

• Increasing the number of data points (n) adds precision to the C.I.’s more than the P.I.’s This is

because increasing n makes the second and third terms under the radical vanish to 0. These are

the only terms under the radical for C.I.’s, but P.I.’s have the 1 as well.

Victory Solutions MIPSS Team

Output Comparisons: Input

Uncertainty

This graph illustrates the results of including only input uncertainty in a simulation.

0

2

4

6

8

10

0 1 2 3 4 5 6 7 8 9 10

Input Uncertainty

It assumes that the output from the regression equation is the desired predicted value, which is rarely the case when incorporating uncertainty. The regression equation estimates the mean of the y values, but usually an estimate for a new observation is desired.

Victory Solutions MIPSS Team

Output Comparisons: Model

Uncertainty

This graph illustrates the results of including only model uncertainty in a simulation.

0

2

4

6

8

10

0 1 2 3 4 5 6 7 8 9 10

Modeling Uncertainty

It assumes that the input value for x is known absolutely at the time of estimating, which is not usually the case. It uses the regression equation to get one mean estimate and then uses prediction intervals to produce a range of y values for the sole x value.

Victory Solutions MIPSS Team

Output Comparisons: Input and

Model Uncertainty

This graph illustrates the results of including both input and model uncertainty in a simulation.

-2

0

2

4

6

8

10

12

0 1 2 3 4 5 6 7 8 9 10

Input and Model Uncertainty

This captures the widest and most accurate range of potential y values by including both types of uncertainty.