the dmaic lean six sigma project and team tools approach analyze phase (part 2)

89
The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Upload: stanley-henry

Post on 28-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

The DMAIC Lean Six Sigma Project and Team Tools Approach

Analyze Phase(Part 2)

Page 2: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Lean Six Sigma Black Belt Training! Analyze (Part 2) Agenda

Review Analyze Part 1Inferential StatisticsHypothesis TestingP-valuesDiscrete X / Continuous Y Statistical TestsContinuous X / Continuous Y Statistical TestsDiscrete X / Discrete Y Statistical Tests Applications / Lessons Learned / Conclusions Next Steps

Page 3: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Six Sigma AnalyzeInferential Statistics

(Identifying What’s Different (Xs) Statistically)

Page 4: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

4

Introduction to Hypothesis Testing

109876543

0.4

0.3

0.2

0.1

0.0

X

Normal, Mean=6.5, StDev=1

11109876543

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.00

X

Normal, Mean=6.9, StDev=1.2

Are these samples from the same population?

Mean=6.5

StDev=1

Mean=6.9

StDev=1.2

Sample 1 Sample 2

Page 5: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

5

Intro. to Confidence Intervals (pg. 157)

• Brutal Facts Regarding Samples– We know that the size of the sampling error is primarily

based on the variation in the population and the size of the sample selected.

– Larger samples have a smaller margin or error, yet are more costly to obtain.

– As reality in practice dictates, one sample is usually selected and it usually is the minimum size required.

– Therefore, a method was needed to estimate a population parameter. This method resulted in the term Confidence Interval.

Page 6: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

6

Intro. to Confidence Intervals (pg. 157)• A statistic plus or minus a margin of error is called a confidence

interval. • A confidence interval is a range of values, calculated from a data set,

that gives an assigned probability that the true value falls within that range.

• The confidence level is dependent on the range of the margin of error that is selected. Generally, the margin of error that is accepted is plus or minus 2 standard errors, resulting in a 95% confidence level.

• “We are 95% confident that the true average door-to-balloon time is between 60 and 100 minutes.”

Page 7: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

50

Assume we have a population of N size that is not normally distributed.

We draw 100 random samples and plot the averages of each sample.

We get a normal distribution with a mean of 50 and n=100.

Page 8: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

50

68%

95%

The mean of our sampled distribution is 50.

How confident are we of where the population mean lies?

Similar to standard deviation, we know that 68% of the sample distribution lies within 1 standard error and 95% within 2 standard errors.

-2 SE +2 SE-1 SE +1 SE

Page 9: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Let’s assume we want to be 95% confident of where the true mean of the population lies

We can be 95% confident that the true mean lies within +/- 2SE

50-2 SE +2 SE-1 SE +1 SE

95%

σ√ nSE =

In this case, let’s assume that SE=3, so 2 x SE = 6.

• The mean of our sample distribution is 50.• We are 95% confident that the true mean of the population lies between 44 and 56.• Our margin of error is +/- 6.

Page 10: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

10

Central Limit Theorem/ Margin of Error/ Confidence Intervals

• Why Use it? Why is this important?– Six Sigma practitioners use the sample data and apply normal theory for

making inferences about population parameters irrespective of the actual form of the parent population.

– Many statistical tests are founded on the principle that we do not need to know the original distribution. Means and proportions will always be “normal” if n is big enough.

– Practically, we use the central limit theorem to help us estimate the true average, and calculate the likelihood of observing certain events.

– Considering time and resources, we need to have a measure of confidence around our sample statistics.

– None of this is applicable if your data is Unreliable or BIASED!!!!

Page 11: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

11

Data-Driven Problem Solving:Hypothesis Testing

Two fundamental questions must be adequately answered in order to be able to adequately perform hypothesis testing:

–What type of data is available (and reliable)?

–What question are you asking (what do you need to understand)?

Page 12: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

12

Introduction to Hypothesis Testing (pg. 156)

• Hypothesis testing is basically the process of using statistical analysis to determine if the observed differences between two or more sets of data are due to random chance variation, or due to true differences in the underlying populations.

• Generally, Hypothesis Testing tells us whether or not sets of data are truly different with a certain level of

confidence.

Page 13: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

13

Introduction to Hypothesis Testing

109876543

0.4

0.3

0.2

0.1

0.0

X

Normal, Mean=6.5, StDev=1

11109876543

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.00

X

Normal, Mean=6.9, StDev=1.2

Are these samples from the same population?

Mean=6.5

StDev=1

Mean=6.9

StDev=1.2

Sample 1 Sample 2

Page 14: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

14

The Six Sigma Approach

Practical Problem – Lab specimens are mislabeled

too often; leads to incorrect

diagnosis and treatment

StatisticalProblem –

Specimens are mislabeled8 out of 10,000

collected

Statistical Solution –

~85% of mislabeled specimens come from

the ED

Practical Solution – Redesign of the

process of labeling and transporting

specimens leads to dramatic reduction in

errors

Six Sigma applies many tools, including statistical tools to practical problems. The key is data-driven decision making.

Statistical Problem

– Defining

the problem in statistical

terms

PracticalSolution –

addresses the verified root causes

Statistical

Solution – Using data

and statistics to understand

the cause of the problem

Practical Problem

– An unacceptable variation or gap

in quality

Page 15: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

15

Introduction to Hypothesis Testing• Hypothesis Testing allows us to answer a practical

question - Is there a true difference between ___ and ___ ?

• Practically, Hypothesis Testing uses relatively small sample sizes to answer questions about the population.

• There is always a chance that the samples we have collected are not truly representative of the population. Thus, we may obtain a wrong conclusion about the population(s) being studied.

Page 16: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

16

Introduction to Hypothesis Testing:Testing Terms and Concepts

• Statistically, we “ask and answer questions” using stated hypotheses that are tested at some level of confidence.

• The null hypothesis (Ho) is a statement being tested to determine whether or not it is true (the assumption that there is no difference).

• The alternative hypothesis (Ha) is a statement that represents reality if there is enough evidence to reject

the stated null (Ho)… i.e. the null hypothesis is false.

Page 17: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

17

Introduction to Hypothesis TestingExample:

Is the average Length of Stay for a total knee replacement different for Hospital A vs. Hospital B?

Common Language:

Ho: There is no difference in average length of stay between facilities.

Ha: There is a difference in average length of stay between facilities.

Statistical Language:

Ho: Alos = Blos

Ha: Alos ≠ Blos

Page 18: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

18

Introduction to Hypothesis Testing:Type I and Type II Errors (Risk)

• As stated earlier, there is the risk of arriving at a wrong conclusion about the hypothesis we are testing. The two types of error that can occur with hypothesis testing are called Type I and Type II. The associated risks are called Alpha and Beta risks.

• A Type I (Alpha) error is concluding there is a difference when there really isn’t one. - Rejecting the null when you should not!

• A Type II (Beta) error is concluding there is not a difference when there really is one. - Do not reject the null when you should!

Page 19: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

19

Type I and Type II errors,Confidence, Power, and p-values

Type I Error

(risk)Correct

Type II Error

(risk)Correct

Reject H0

Do not reject H0

H0 is true

H0 is false

Th

e T

rue

Sta

tem

en

t

Conclusion DrawnYou conclude there IS a difference when there really isn’t

You conclude there is NO difference when there really is

Page 20: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

20

Type I and Type II errors in the Justice System

Innocent person

convicted

Innocent person

acquitted

Guilty person acquitted

Guilty person

convicted

GuiltyAcquittal

Did not commit crime

Committed crime

Tru

e S

tate

Verdict

Page 21: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Result MatrixHo: No difference between the accused and an innocent person

Jury Trial Hypothesis TestingVerdict Decision

Acquittal

Guilty Do not reject

Ho

Reject Ho

Did not commit crime

Correct

Type I error

()

Ho

is true

Correct Type I error ()

Committed Crime

Type II error ()

Correct

Ho

is false

Type II error()

CorrectT

he T

ruth

The

Tru

th

Page 22: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

22

Introduction to p-value

• The p-value measures the probability of observing a certain amount of difference if the null hypothesis is true.

• In comparing the average length of stay (ALOS) at Hospitals A and B, p-value measures the likelihood of observing a difference in ALOS if the null hypothesis is true.

• If the p-value is large, then both averages probably came from the same population (i.e. there is no difference between ALOS at Hospital A and B).

• If the p-value is small, then it is unlikely both averages came from the same population (i.e. there is a difference between ALOS at Hospital A and B).

Page 23: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

23

P-Value (pg. 160)What’s the probability of getting a

value of “40”? mean

50

mean

5040 40

Page 24: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

24

Setting the Alpha threshold

• Alpha () is the level of risk you are willing to accept of making a Type I error (i.e. rejecting the null when the null is true).

• Traditionally, alpha () is set at 0.05, which means you are willing to accept a 5% chance of making a Type I error (i.e. rejecting the null when the null is true).

Page 25: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

25

P-ValueThe critical value at which the null hypothesis is

rejected.

“If p is low, Ho must go” (usually at or below 0.05)

mean

Fail to reject

Fail to reject

region (reject)

region (reject)

Page 26: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

26

Hypothesis Testing – Basic Steps(see also pg 156-160)

1. State the practical problem2. State the null hypothesis3. State the alternate hypothesis4. Test the assumptions of the data5. Determine appropriate alpha () decision value 6. Calculate the appropriate test statistic and calculate

p-value7. If calculated p-value < then reject Ho; if

p-value > then fail to reject Ho

8. Formulate the statistical conclusion into a practical solution

Page 27: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Analyze – Hypothesis Testing – Type I / II Errors

Identify data types

Project Y Project Y Data Type

X Factor X Data Type

What hypothesis is being tested?

Null hypothesis statement

Alternate hypothesis statement

Statistical test

Assumptions

Are the assumptions for this test met (if applicable)?

Results

P-value

% Contribution of variation in X to variation in Y

Accept alternate hypothesis

Reject alternate hypothesis

Conclusions/Observations

Hypothesis Testing Worksheet

Page 28: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

28

Statistical Testing – Basic Steps1. What theory or potential cause is presented or proposed? 2. Given the theory or potential cause in front of you, What is the question you are trying to

answer?3. Do you have data directly related to and describing the question you are asking? What

type of data do you have?4. If you do not have data, can you collect the appropriate data (reasonably and

appropriately)? If no data exists relating to the theory being considered, or if it will be very costly to obtain, re-visit the magnitude and urgency of testing this particular theory. Proceed with data collection and sorting/grouping as needed.

5. State the question as a null hypothesis (There is no difference…)6. State the alternate hypothesis7. Test the assumptions of the data as needed (normality, quantity, variances, etc.)8. Determine appropriate alpha () decision value (.05, etc.)9. Chose and calculate the appropriate test statistic (determined by the data you have and

the question you are asking) and the associated p-value10. If calculated p-value < then reject Ho; if p-value > then fail to reject Ho

11. Formulate the statistical conclusion into a practical solution (answer to question)

Page 29: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

29

Remember? - Data-Driven Problem Solving:

Hypothesis TestingTwo fundamental questions must be adequately answered in order to be able to adequately perform hypothesis testing:

–What type of data is available (and reliable)?

–What question are you asking (what do you need to understand)?

Page 30: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

30

What Type of Data to Analyze:

• Discrete X / Continuous Y

• Continuous X / Continuous Y

• Discrete X / Discrete Y

Page 31: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

31

Page 32: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Reference Sheet: Statistical Test Selection and "p-values" interpretation (based on 95% Confidence)

Input (x) Output (Y)Practical / General question we are

askingThe Tool Minitab commands P-Value < 0.05 P-Value > 0.05

             

/ Continuous Is my collected set of data normally distributedAnderson Darling

Normality TestStat>Basic Statistics >

Display Descriptive StatitsticsYou can be confident that your data is not Normally distributed. You can assume that your data is Normally distributed.

             

Discrete ContinuousIs the average of my sample the same as a given or

known value 1 Sample t-Test

(against a known value)Stat > Basic Statistics >

1 - Sample tYou can be confident that your sample has a different average

from the known test value.There is no difference between your sample average and the known test value (based on the data

you have).

             

Discrete ContinuousAre the averages from 2 different sets of data the

same2 Sample t-Test

Stat > Basic Statistics > 2 - Sample t

You can be onfident that the averages of the two samples are different.

There is no difference between the averages of the two samples (based on the data you have).

             

Discrete ContinuousAre the averages from paired sets of data (e.g.

before / after) the samePaired t-Test Stat > Basic Statistics > Paired t

You can be confident that there is a consistent difference between the pairs of data.

There is no consistent difference between the pairs of data (based on the data you have).

             

Discrete ContinuousIs there at least one average from several sets of

data (>2) that is different One Way ANOVA Stat > ANOVA > One - Way

You can be confident that at least one of the samples has a different average from the others.

There is no difference in the averages of the samples (based on the data you have).

             

Discrete ContinuousIs there at least one median from several sets of data

(>2) that is different Kruskal Wallis & Mood's Median Test Stat > Nonparametrics

You can be confident that at least one of the samples has a different median from the others.

There is no difference in the medians of the samples (based on the data you have).

             

Discrete ContinuousIs there at least one variance from several sets of

data that is different F-test, Levene's test

Bartlett's testStat > ANOVA >

Test for equal variancesYou can be confident that at least one of your samples has a

different standard deviation from the others.There is no difference between the standard deviations of the samples (based on the data you

have).

             

Discrete DiscreteIs the proportion, or rate, from my sample the same

as a given proportional value1 Proportion

(against a known value)Stat > Basic Statistics >

1 ProportionYou can be confident that your sample has a different proportion

from the known test value.There is no difference between your sample proportion and the known test value (based on the

data you have).

             

Discrete DiscreteAre the proportions from 2 different sets of data the

same2 Proportion

Stat > Basic Statistics > 2 Proportions

You can be confident that the proportions from the two samples are different.

There is no difference between the proportions from the two samples (based on the data you have).

             

Discrete Discrete

Is there at least one proportion from several sets of data that is different; Are observed frequencies the

same as expectedChi-Square Stat > Tables > Cross Tabulation and Chi - Square

You can be confident that at least one of the samples has a different proportion from the others.

There is no difference in the proportions from the samples (based on the data you have).

             

Continuous ContinuousAs one variable changes, you can predict the change

in another (correlated) variableCorrelation

(Pearson Coefficient)Stat > Basic Statistics > Correlation

You can be confident that there is a correlation (Pearson coefficient is not zero).

There is no correlation (based on the data you have). (Pearson coefficient could be zero)

             

Continuous ContinuousDoes one continuous factor (input) affect another

continuous factor (output)Regression Stat > Regression > Regression

You can be confident that the input factor (predictor) affects the process output.

There is no correlation between the input factor (predictor) and the process output (based on the data you have).

Page 33: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

33

Data-Driven Analysis:Discrete X / Continuous Y

• Descriptive Statistics: mean, median, variance, standard deviation

• Graphical display: box plots, error bars, run charts

• Potential Questions: Is there a difference in means, medians, variances

Page 34: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

34

1 sample Chi2 TestHO: σ1=σt

HA: σ1≠σt t=target

Stat>Basic Stat>Display Desc>Graphical Summary (if target std dev falls within CI then fail to reject HO)

Chi2 TestHO: σ1=σt

HA: σ1≠σt t=target

Stat>Basic Stat>Display Desc>Graphical Summary (if target std dev falls within CI then fail to reject HO)

2 sample F TestHO: σ1=σ2

HA: σ1≠σ2

Stat>ANOVA>Test for Equal variance

Levene’s TestHO: σ1=σ2=σ3...

HA: σi≠σj for ij (at least one is different)Stat>ANOVA>Test for Equal Variance

>2 sample Bartlett’s TestHO: σ1=σ2=σ3…

HA: σi≠σj for i≠j (at least one is different)Stat>ANOVA>Test for Equal VarianceIf variances are NOT equal, proceed with caution or use Welch’s Test, which is not available in Minitab

Levene’s TestHO: : σ1=σ2=σ3...

HA: σi≠σj for ij (at least one is different)Stat>ANOVA>Test for Equal Variance

DistributionNormal Non-normal or unknown

Sam

ple

Variance Testing

Page 35: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

35

Test for Equal VariancesStat>Basic Statistics>2 Variances

Page 36: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

36

Test for Equal VariancesStat>Basic Statistics>2 Variances

Test for Equal Variances: Quality versus Region

95% Bonferroni confidence intervals for standard deviations

Region N Lower StDev Upper 1 116 2.13011 2.46845 2.92567 2 67 2.03534 2.46264 3.09934 3 100 2.58684 3.02983 3.64282

Bartlett's Test (Normal Distribution)Test statistic = 5.58, p-value = 0.061

Levene's Test (Any Continuous Distribution)Test statistic = 6.24, p-value = 0.002

Page 37: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

37

Test for Equal VariancesStat>Basic Statistics>2 Variances

Page 38: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

38

Hypothesis Testing: Discrete X / Continuous Y

For : 1 Sample t-test (See page 162 in The Lean Six Sigma Pocket Toolbook) Ho: equal to a target or known value

Ha: is not equal to a target or known value

Statistical Test: One sample t-testTest Statistic: T-value – based on the area under the curve of an unknown or non-normal distribution

Page 39: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

39

Hypothesis Testing: Discrete X / Continuous Y

For : 2 Sample t-test (See page 182 in The Lean Six Sigma Pocket Toolbook)Ho: 1 = 2

Ha: 1 ≠ 2

Statistical Test: 2 Sample t-test Test Statistic: T-value – based on the area under the curve of an unknown or non-normal distribution

Page 40: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

40

Hypothesis Testing: Discrete X / Continuous Y

Population is Normal Population is Non-Normal or Unknown

1 group 1-Sample T Test 1-Sample Wilcoxon

2 groups 2-Sample T Test Mann-Whitney Test

>2 groups ANOVA Mood’s Median Test orKruskal Wallis Test

Page 41: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

41

Analyze Tools:Discrete X / Continuous Y

• Graphical display: Box plots– The box shows the range of data values comprising the 2nd

and 3rd quartiles of the data – the “middle” 50% of the data

Median line

3rd Quartile line

1st Quartile line

See page 110 in The Lean Six Sigma Pocket Toolbook

Page 42: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

42

Analyze Tools: Box Plots

There are 24 entries in this table25%1st Quartile

25%

4th Quartile

25%

2nd Quartile

25%

3rd Quartile

Median= 4.5

The Inter Quartile Range (IQR) is the range encompassed by the 2nd Quartile and 3rd Quartile… 6-4=2

11122344444455556677881013

0

14

5Median

2nd Quartile

3rd Quartile

Extends to largest value within 3Q+1.5 x IQR

Outlier

Extends to smallest value within 2Q-1.5 x IQR

*

Page 43: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

43

Data-Driven Analysis:Continuous X / Continuous Y

• Descriptive Statistics: correlation

• Graphical Display: scatter plot, run charts

• See 165-175 in The Lean Six Sigma Toolbook

Page 44: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

44

Analyze Tools:Continuous X / Continuous Y

• Correlation indicates whether there is a relationship between the values of two measurements– Positive correlation: higher values in X are associated with higher

values in Y– Negative correlation: higher values in X are associated with lower

values in Y.

• Correlation does NOT imply cause-and-effect!– Correlation could be coincidence– Both variables could be influenced by some lurking variable

Page 45: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

45

Hypothesis TestingCorrelation Statistics

• Regression analysis generates correlation coefficients to indicate the strength and nature of the relationship

– Pearson correlation coefficient (r): the strength and direction of the relationship

• Between 1 and -1

– r2:percent of variation in Y that is attributable to X• Between 0 and 1

Page 46: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

46

Hypothesis Testing:Continuous X / Continuous Y

For : Regression and Correlation (pg. 168)Ho: The slope of the line is equal to zero

= 0

Ha: The slope of the line does not equal zero

≠ 0

Statistical Test: RegressionTest Statistic: F ratio – a measure of actual to expected

variation in the sample

Page 47: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

47

Correlation ExampleStat>Basic Statistics>Correlation

Correlations: Clarity, Quality

Pearson correlation of Clarity and Quality = 0.075P-Value = 0.208

Page 48: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

48

Pearson’s r Rules of Thumb

• Strength and direction of relationship between x and Y

• 0 to .20: no or negligible correlation.• .20 to .40: low degree of correlation.• .40 to .60: moderate degree of correlation.• .60 to .80: marked degree of correlation.• .80 to 1.00: high correlation.

Page 49: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

49

Regression ExampleStat>Regression>Regression…

Regression Analysis: Quality versus Clarity

The regression equation isQuality = 11.7 + 1.02 Clarity

Predictor Coef SE Coef T PConstant 11.6524 0.7253 16.06 0.000Clarity 1.0234 0.8118 1.26 0.208

S = 2.82408 R-Sq = 0.6% R-Sq(adj) = 0.2%

Analysis of Variance

Source DF SS MS F PRegression 1 12.676 12.676 1.59 0.208Residual Error 281 2241.094 7.975Total 282 2253.770

Page 50: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

50

Regression Example 2Stat>Regression>Fitted Line Plot…

Analyze - Continuous X / Continuous Y

Regression Analysis: Quality versus Clarity

The regression equation isQuality = 11.65 + 1.023 Clarity

S = 2.82408 R-Sq = 0.6% R-Sq(adj) = 0.2%

Analysis of Variance

Source DF SS MS F PRegression 1 12.68 12.6757 1.59 0.208Error 281 2241.09 7.9754Total 282 2253.77

Page 51: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

51

r2 Rules of Thumb

• The “coefficient of determination”• What percent of the variation in Y is due to x?• less than or equal to .4 - not predictive• .40 to .65 mildly predictive• .65 to .86 moderately predictive• .86 to 1 strongly predictive

Page 52: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Residuals

• Regression uses a method called “least squares” to choose the line that minimizes the sum of the squared vertical distances from the points on the lines.

52

Page 53: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Residuals• The distances between the points and the regression line are called

“residuals.” The residuals represent the portion of the Y that are not explained by the regression equation

53

Residuals

Residuals

Page 54: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Residuals• In Minitab, you can plot the residuals four ways.

54

(also see 195-196 in The Lean Six Sigma Toolbook)

Page 55: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Residuals• Regression has three assumptions about residual “errors.”

55

Errors are:1.Random and independent2.Normally distributed3.Have constant variance

Page 56: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Residuals• Errors are random and independent

56

Residuals versus order1.Displayed in order collected2.If order is immaterial, do not use this3.Are the residuals random? Do they exhibit any patterns?

Page 57: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Residuals• Errors are normally distributed

57

Normal plot of residuals1.Errors should follow a straight line on a normal probability plot2.Use the “fat pencil” test. Would a fat pencil laid on the normal probability plot cover the data points?

Page 58: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Residuals• Errors have constant variance over all values of x

58

Residuals versus fits1.Should show a random scatter and have no pattern2.Should have roughly the same number of point above 0 as below

Page 59: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Flavor versus Quality

59

Correlations: Quality, Flavor

Pearson correlation of Quality and Flavor = 0.870P-Value = 0.000

Regression Analysis: Quality versus Flavor

The regression equation isQuality = 2.913 + 1.997 Flavor

S = 1.39575 R-Sq = 75.7% R-Sq(adj) = 75.6%

Analysis of Variance

Source DF SS MS F PRegression 1 1706.35 1706.35 875.89 0.000Error 281 547.42 1.95Total 282 2253.77

Page 60: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

60

Analyze Tools: Continuous X / Continuous Y

20151050

20

15

10

5

0

A3

B3

Scatterplot of B3 vs A3

654321

6

5

4

3

2

1

A2

B2

Scatterplot of B2 vs A2

r =1r2=1Perfect positive correlation

654321

6

5

4

3

2

1

A1

B1

Scatterplot of B1 vs A1

r =-1r2=1Perfect negative correlation

r = 0r2= 0No correlation

Page 61: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

61

Data-Driven Analysis:Discrete X / Discrete Y

• Descriptive Statistics: counts and proportions

• Graphical display: bar graph and Pareto chart– A Pareto chart is a type of bar graph where the categories

are arranged from largest to smallest with a line indicating the cumulative percent

Page 62: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

62

Contingency Tables

• χ2 : the statistic used to test hypotheses about the frequency of some event– Goodness of Fit: is observed different from

expected?– Test for independence: are samples from the

same distribution?

Page 63: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

63

Goodness of Fit Test

• Compare actual and expected frequencies• Calculate the χ2 statistic• Compare to a χ2 critical value from table• If χ2

calc > χ2crit, there is a difference

Page 64: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

64

Calculate the χ2 statistic

• χ2= the sum of the squares of the differences between the actual and the expected frequencies divided by the expected frequencies

χ2= Σg

(fo-fe)2

fej=1

Page 65: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

65

Coin-toss

• Will a fair coin tossed 100 times come up 66 times heads and 34 times tails?

Page 66: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

66

Coin-tossObserved

(fo)Expected

(fe)

Heads 66 50

Tails 34 50

(fo-fe)2

fe

(66-50)2

505.12162

50= 256

50= =

(34-50)2

50=-162

50= 256

50= 5.12

10.24Σ

=10.24Σg

(fo-fe)2

fej=1

χ2calc=

Page 67: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

67

Look up the χ2 critical value

• First we must determine the degrees of freedom in the contingency table

• “Degrees of freedom” represents the number of values in the final calculation of a statistic that are free to vary

• DF=(rows in data-1)*(columns in data-1)

• In our example, the DF=1

Page 68: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Df/area 0.1 0.05 0.025 0.01 0.005

1 2.70554 3.84146 5.02389 6.6349 7.87944

2 4.60517 5.99146 7.37776 9.21034 10.59663

3 6.25139 7.81473 9.3484 11.34487 12.83816

4 7.77944 9.48773 11.14329 13.2767 14.86026

5 9.23636 11.0705 12.8325 15.08627 16.7496

68

Look up the χ2 critical value

• If χ2calc > χ2

crit, there is a difference• χ2

calc = 10.24• χ2

crit = 3.84• There is a difference!

p-value

Page 69: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

69

Chi-Square Test for Independence

• Goodness of Fit asked if frequencies were different than expected

• Test for Independence asks whether our samples come from the same population

• Example: Students in a Six Sigma Black Belt course are offered two different time slots for taking their final exam. Is there a difference in the passing and failing rates for each group?

• State the null and alternative hypotheses for this problem.

Page 70: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

70

Chi-Square Test for Independence

• We use the same formula, but calculate the expected differently

χ2= Σg

(fo-fe)2

fej=1

Page 71: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

71

Test for Independence

• Arrange the data in table, showing observed frequencies

• Calculate the expected frequencies for each cell

• Calculate the χ2 statistic in each cell• Sum the χ2 statistic from each cell• Compare to a χ2 critical value from table• If χ2

calc > χ2crit, there is a difference

Page 72: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

72

Calculating fe

Number passing

Number failing

Total

1st test

fo=20 fo=50 fo=70

2nd test

fo=40 fo=70 fo=110

Total fo=60 fo=120 fo=180

fe=(f row * f column)

N

Page 73: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

73

Calculating fe

Number passing

Number failing

Total

1st test

fo=20

fe=(70*60)/180

fo=50 fo=70

2nd test

fo=40 fo=70 fo=110

Total fo=60 fo=120 fo=180

fe=(f row * f column)

N

Page 74: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

74

Calculating fe

Number passing

Number failing

Total

1st test

fo=20

fe=23.33

fo=50

fe=(120*70)/180

fo=70

2nd test

fo=40 fo=70 fo=110

Total fo=60 fo=120 fo=180

fe=(f row * f column)

N

Page 75: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

75

Calculating fe

Number passing

Number failing

Total

1st test

fo=20

fe=23.33

fo=50

fe=46.67

fo=70

2nd test

fo=40

fe=36.37

fo=70

fe=73.33

fo=110

Total fo=60 fo=120 fo=180

fe=(f row * f column)

N

Page 76: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

76

Calculate the χ2 statistic for each cell

Number passing

Number failing

Total

1st test

fo=20

fe=23.33

fo=50

fe=46.67

fo=70

2nd test

fo=40

fe=36.37

fo=70

fe=73.33

fo=110

Total fo=60 fo=120 fo=180

.476

.151

.238

.303

= 1.169Σg

(fo-fe)2

fej=1

χ2calc=

Page 77: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Df/area 0.1 0.05 0.025 0.01 0.005

1 2.70554 3.84146 5.02389 6.6349 7.87944

2 4.60517 5.99146 7.37776 9.21034 10.59663

3 6.25139 7.81473 9.3484 11.34487 12.83816

4 7.77944 9.48773 11.14329 13.2767 14.86026

5 9.23636 11.0705 12.8325 15.08627 16.7496

77

Look up the χ2 critical value

• If χ2calc > χ2

crit, there is a difference• χ2

calc = 1.169• χ2

crit = 3.84• There is no difference! Therefore, we fail to reject the null

hypothesis. Ho= pass and fail rate are independent of the time the test was administered.

p-value

Page 78: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

78

Cramer’s test

• Quantifies the strength of the association between x and y

χ2calc

θ=n(q-1)

Where:

n=total number of observations

q=lesser of rows or columns

Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association

Page 79: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

79

Cramer’s test

• Quantifies the strength of the association between x and y

1.169θ=

n(q-1)

Where:

n=total number of observations

q=lesser of rows or columns

Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association

Page 80: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

80

Cramer’s test

• Quantifies the strength of the association between x and y

1.169θ=

180(1)

Where:

n=total number of observations

q=lesser of rows or columns

Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association

Page 81: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

81

Cramer’s test

• Quantifies the strength of the association between x and y

0.00649θ=

Where:

n=total number of observations

q=lesser of rows or columns

Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association

Page 82: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

82

Cramer’s test

• Quantifies the strength of the association between x and y

θ=0.0806

Where:

n=total number of observations

q=lesser of rows or columns

Describing the strength of association.5 to 1 high association.3 to .5 moderate association.1 to .3 low association0 to .1 little if any association

Page 83: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

83

Hypothesis Testing:Discrete X / Discrete Y

For : Comparing one proportion to a given valueHo: The proportion is equal to a given percentage

Ha: The proportion is not equal to a given percentage

Statistical Test: 1 ProportionTest Statistic: Z score – based on the area under the curve of a normal distribution

Page 84: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

84

Hypothesis Testing:Discrete X / Discrete Y

For: comparing two proportionsHo: The proportion of group A equals the proportion of

group B PA = PB

Ha: The proportion of group A does not equal the proportion of group BPA ≠ PB

Statistical Test: Test of ProportionsTest Statistic: Z Score – based on the area under the curve of

a normal distribution

Page 85: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

85

Hypothesis Testing:Discrete X / Discrete Y

• Considerations– For contingency tables, the expected cell count

should be at least 5– For proportions tests, if you do not have enough

successes or failures in your numerator, consider using Fisher’s Exact Test

– Generally, np > 5 and n(1-p) > 5 is a minimum standard

Page 86: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Six Sigma Analyze:

Remember, statistical analysis and testing within the context of practically applying Lean Six Sigma is about using data to identify the Key Xs to “fix” that will most likely result in a measureable improvement in the process Y (output),

which in turn will improve customer satisfaction and efficiency.

Page 87: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

87

Key Deliverables for Analyze

• Main elements of Define and Measure completed

• “Obvious Xs” identified and confirmed• Potential Xs identified, data collected and

analyzed• Root causes investigated and supported with

data – the Xs to improve

Page 88: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

88

Start Date: Enter Date End Date: Enter Date

Benchmark Analysis Project Charter Formal Champion

Approval of Charter (signed)

SIPOC - High Level Process Map

Customer CTQs Initial Team meeting

(kickoff)

Start Date: Enter DateEnd Date: Enter Date

Identify Project Y(s) Identify Possible Xs

(possible cause and effect relationships)

Develop & Execute Data Collection Plan

Measurement System Analysis

Establish Baseline Performance

Start Date: Enter DateEnd Date: Enter Date

Identify Vital Few Root Causes of Variation Sources & Improvement Opportunities

Define Performance Objective(s) for Key Xs

Quantify potential $ Benefit

Start Date: Enter DateEnd Date: Enter Date

Generate Solutions Prioritize Solutions Assess Risks Test Solutions Cost Benefit

Analysis Develop &

Implement Execution Plan

Formal Champion Approval

Start Date: Enter DateEnd Date: Enter Date

Implement Sustainable Process Controls – Validate:

Control System Monitoring Plan Response Plan System Integration

Plan $ Benefits Validated Formal Champion

Approval and Report Out

Author: Enter NameDate: April 19, 2023

Project Name:Problem Statement:Mislabeled example

Project Scope:Enter scope description

Champion: NameProcess Owner: NameBlack Belt: NameGreen Belts:Names

Customer(s):CTQ(s):Defect(s):Beginning DPMO:Target DPMO:Estimated Benefits:Actual Benefits:

Not Complete Complete Not Applicable

MeasureMeasureDefineDefine

Directions:•Replace All Of The Italicized, Black Text With Your Project’s Information•Change the blank box into a check mark by clicking on Format>Bullets and•Numbering and changing the bullet.

AnalyzeAnalyze ImproveImprove ControlControl

Page 89: The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Six Sigma Analyze:

Now, what specifically are we going to improve in the Improve Phase?

We should have evidence (data) to support what we are improving and why?