3-1 review of analyze phase

54
1 © 2001 ConceptFlow Review of Analyze

Upload: kjets78

Post on 28-Dec-2015

12 views

Category:

Documents


0 download

DESCRIPTION

review

TRANSCRIPT

Page 1: 3-1 Review of Analyze Phase

1© 2001 ConceptFlow

Review of Analyze

Page 2: 3-1 Review of Analyze Phase

2© 2001 ConceptFlow

Analyze Phase Deliverables

• A prioritized list of potential sources of variation • Variation Component Studies• Measurement Analysis on the x’s• Data collected to validate sources• Graphical and statistical analysis of data

• P-value establishing level of significance and probability• Correlation and regression analysis to determine variable relationships• Reduced list of potential key input variables that affect the output(s)• Updated control charts, process map & FMEA• Results to data (compared to baseline)

Define Improve ControlMeasure Analyze

Statistically links key input variables with key output variable

Page 3: 3-1 Review of Analyze Phase

3© 2001 ConceptFlow

Analyze Week Topics

• Review of Measure Week• Central Limit Theorem• Confidence Intervals• Introduction to Hypothesis

Testing• Hypothesis Testing

• Means• Variance• Proportion• Chi Square

• Analysis of Variance (ANOVA)• Variation Components• Correlation and Simple

Regression• Multiple Regression• Wrap-up and Deliverables

Page 4: 3-1 Review of Analyze Phase

4© 2001 ConceptFlow

n sample sizex

individuals

x

Central Limit Theorem Defined

• If variable x has an unknown distribution with mean = and standard deviation = , then

• Sampling distribution of x (mean) having sample size of n will

(1) have a mean,

(2) have a standard deviation,

(3) tend to be normal as the sample size becomes large (n>30 for unknown distributions)

Page 5: 3-1 Review of Analyze Phase

© 2001 ConceptFlow

Standard Error of the Mean

mean for the Size Sample=n

Scores Individual for theDeviation Standard

Mean theofError Standard

x

Distribution of Sample Averages

Population of Individuals

SE Mean =

x

nx

Page 6: 3-1 Review of Analyze Phase

© 2001 ConceptFlow

Central Limit Theorem Objectives

By the end of this module the participant should be able to:• Discuss the Central Limit Theorem (CLT) and demonstrate its results

using a practical example• Discuss the implications of Central Limit Theorem in statistical analysis• Describe how to apply the Central Limit Theorem to reduce

measurement variation

Page 7: 3-1 Review of Analyze Phase

7© 2001 ConceptFlow

A Graphical View

A 95% confidence interval suggests that approximately 95 out of 100 confidence intervals will contain the population parameter

Confidence Interval

Population Mean

Sample Mean

Page 8: 3-1 Review of Analyze Phase

8© 2001 ConceptFlow

SAMPLE WITHIN

(subset)

ENTIRE POPULATION

Population Versus Sample

Sample mean=X

“Population Parameters”

“Sample Statistics”

= Population mean

s = Sample standard deviationPopulation

= Population standard deviation

If we only pull samples, do we ever know the true population parameters?

Sample

Page 9: 3-1 Review of Analyze Phase

9© 2001 ConceptFlow

CI = Sample Statistic Margin of Error

Margin of Error = K * Measure of Variability

Statistic = Mean, Variance, Proportion, etc. from sample

Confidence Factor, K = Constant based on a statistical distribution

Estimating Confidence Intervals (CIs)

• Parametric confidence intervals in most cases take the general form:

• Confidence intervals reflect the sample to sample variation of our point estimates

Page 10: 3-1 Review of Analyze Phase

10© 2001 ConceptFlow

Confidence Interval and Central Limit Theorem

10090807060504030

500

400

300

200

100

0

Population

Fre

qu

en

cy

10090807060504030

80

70

60

50

4030

20

10

0

Sample

Fre

qu

en

cy

43210-1-2-3-4

99.73%

95.44%

68.26%

Pro

babi

lity

of S

ampl

e V

alue

95% of all sample means are within two “standard errors” of the population mean

Page 11: 3-1 Review of Analyze Phase

11© 2001 ConceptFlow

Confidence Interval Objectives

By the end of this module participants should be able to:• Discuss the role of confidence intervals in statistical analysis• Discuss the meaning of confidence intervals in

practical terms• Calculate confidence intervals for the mean, standard deviation,

proportion and other derived parameters such as Cp and Pp

Page 12: 3-1 Review of Analyze Phase

12© 2001 ConceptFlow

What is Hypothesis Testing?

• In hypothesis testing, relatively small samples are used to answer questions about population parameters (inferential statistics)

• There is always a chance that the selected sample is not representative of the population; therefore, there is always a chance that the conclusion obtained is wrong (Alpha & Beta Risks)

• With some assumptions, inferential statistics allows the estimation of the probability of getting an “odd” sample and quantifies the probability (p-value) of a wrong conclusion

Page 13: 3-1 Review of Analyze Phase

13© 2001 ConceptFlow

Process Flow of a Hypothesis Test

DECIDE:What does the evidence suggest?Reject Ho? or Fail to reject Ho?

Calculate test statistic and/or p-value

Collect sample data

Establish significance level ()

State the “Alternate Hypothesis” (Ha)

State a “Null Hypothesis” (Ho)

Define the problem and state objectives

Page 14: 3-1 Review of Analyze Phase

14© 2001 ConceptFlow

Forming a Hypothesis

• Null Hypothesis (Ho)

• No difference/ no change • Factor not statistically significant• Population follows a normal

distribution

• Alternative Hypothesis (Ha)

• Difference/change occurred• Factor statistically significant• Population does not follow a

normal distribution

Assume H0 to be true until proven otherwise. Burden of proof rests with Ha

Page 15: 3-1 Review of Analyze Phase

15© 2001 ConceptFlow

(Alpha) - Simplified Perspective

Null Hypothesis (Ho) assumed true

• e.g., defendant assumed innocent• Prosecuting attorney must provide evidence beyond reasonable doubt

that assumption is not true• Reasonable doubt = (significance level)

Page 16: 3-1 Review of Analyze Phase

16© 2001 ConceptFlow

Alpha () & Beta () Risk

-risk • Risk of finding a difference when there really isn’t one• Type I error or Producers’ risk

-risk• Risk of not finding a difference when there really is one• Type II error or Consumers’ risk

Page 17: 3-1 Review of Analyze Phase

17© 2001 ConceptFlow

Sensitivity

/ where = size of difference and =SD• Relative magnitude or size of the difference being tested expressed in

standard deviations• Called test sensitivity

1

/2

Page 18: 3-1 Review of Analyze Phase

18© 2001 ConceptFlow

The Relationship in Hypothesis Testing

Decision

Fail to reject Ho

Truth

Ho true

Ha true

Type I Error-Risk or false

positive)

Type II Error-Risk or false

negative)

Correct Decision

CI = 1-

Correct Decision

Power = 1-

Reject Ho

Producers’ Risk

Consumers’ Risk

Page 19: 3-1 Review of Analyze Phase

19© 2001 ConceptFlow

Test Statistic and -value Graphical View

0

Observed value of Test Statistic

Critical value

-risk - value

Page 20: 3-1 Review of Analyze Phase

20© 2001 ConceptFlow

Hypothesis Testing Introduction Objectives

By the end of this module participants should be able to:• Discuss the hypothesis testing process• Recognize and risks and how they affect hypothesis testing• Discuss how the p-value is used for decision making• Relate the hypothesis testing process to real world examples

Page 21: 3-1 Review of Analyze Phase

21© 2001 ConceptFlow

Comparison of Means: 4 Scenarios

1. Single Mean Comparison

• One sample vs. target

• is known

2. Single Mean Comparison

• One sample vs. target

• is NOT known

targetvalue

vs.

targetvalue

vs.

Page 22: 3-1 Review of Analyze Phase

22© 2001 ConceptFlow

Comparison of Means: 4 Scenarios

3. Two Sample Comparison

• Two independent samples compared to each other

4. Paired Comparison

• The difference (“”) between two paired samples

vs.

1

- =

1

2

2

d

d vs. target

Page 23: 3-1 Review of Analyze Phase

23© 2001 ConceptFlow

Hypothesis Testing of Means-Roadmap

3 or more

factors

Comparing Means

1 Factor

1-sample Z-test

Two way ANOVA

ANOVAGLM

One way ANOVA

1-sample t-test

2-samplet-test

Paired t-test

1 Sample 2 Samples 2 or more

samples

2 Factors

not known known independent paired

Page 24: 3-1 Review of Analyze Phase

24© 2001 ConceptFlow

Means Hypothesis Testing Objectives

By the end of this module participant should be able to:• Choose the appropriate test for a given problem regarding population

mean• Perform hypothesis tests of mean• Design and apply hypothesis tests of mean on projects

Page 25: 3-1 Review of Analyze Phase

25© 2001 ConceptFlow

vs. targetvalue

Comparison of Variance: 3 Scenarios

1. Single Variance Comparison

• One population standard deviation compared to a target value

2. Two Sample Comparison

• Variances of two independent populations compared to each other

vs.

21

22

Page 26: 3-1 Review of Analyze Phase

26© 2001 ConceptFlow

Comparison of Variance: 3 Scenarios

3. More than Two Sample Comparison

• Variances of more than two independent populations compared to each other

vs.

21 2

322

vs.

Page 27: 3-1 Review of Analyze Phase

27© 2001 ConceptFlow

1 VarianceTest

1 Sample

Comparing Variances

Hypothesis Testing of Variation - Roadmap

2 VarianceTest

2 Sample

Test for EqualVariance

More Than 2 Samples

Levene’s TestBartlett’s TestLevene’s TestF- TestDescriptiveStatistics

Page 28: 3-1 Review of Analyze Phase

28© 2001 ConceptFlow

Variation Hypothesis Testing Objectives

By the end of this module participants should be able to:• Choose the appropriate test of variance for a given problem• Perform hypothesis tests of variance• Design and apply hypothesis tests of variance on projects

Page 29: 3-1 Review of Analyze Phase

29© 2001 ConceptFlow

P

Comparison of Proportion: 2 Scenarios

1. Single Proportion Comparison

• One population proportion compared to a target value

2. Two Sample Comparison

• Proportions of two independent populations compared to each other

vs.

P1P2

Page 30: 3-1 Review of Analyze Phase

30© 2001 ConceptFlow

1 ProportionTest

Comparing Proportions

Hypothesis Testing of Proportion - Roadmap

2 ProportionTest

2 Sample

Chi-Square Test

More than 2 samples1 Sample

Page 31: 3-1 Review of Analyze Phase

31© 2001 ConceptFlow

Proportion Hypothesis Testing Objectives

By the end of this module participants should be able to:• Choose the appropriate test of proportion for a given problem• Perform hypothesis tests of proportion• Determine sample size for 1 proportion and 2 proportion hypothesis

testing• Design and apply hypothesis tests of proportion on projects

Page 32: 3-1 Review of Analyze Phase

32© 2001 ConceptFlow

Both of these tools use the Chi-Square distribution, where fo and fe are the observed and expected frequencies, respectively.

What Are Chi-Square Tools?

• Chi-Square Goodness-of-Fit Test

• To test if a particular distribution (model) is a good fit for a population

• Chi-Square Test for Association

• To test if a relationship between two attribute variables exists

2 = fo - fe

2

fej = 1

g

Chi-Square Statistic

Page 33: 3-1 Review of Analyze Phase

33© 2001 ConceptFlow

The Chi-Square Distribution

• Measure of difference between observed counts and expected counts

• Observations must be independent

• Works best with 5 or more observations in each cell

• Cells may be combined to pool observations

0.1

1.2

2.3

3.4

4.5

5.6

6.7

7.8

8.9

1011

.112

.213

.314

.415

.516

.617

.718

.819

.9

= 2

= 10

= 4

2

Val

ue

of

the

(2 )

dis

trib

uti

on

= 6

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5Chi-square distributionfor various degrees of freedom ()

Page 34: 3-1 Review of Analyze Phase

34© 2001 ConceptFlow

Chi Square Hypothesis Testing Objectives

By the end of this module the participants should be able to• Formulate appropriate hypotheses for Chi-Square tests• Apply Chi-Square Goodness-of-Fit Test to practical problems• Apply Chi-Square Test for Association to practical problems

Page 35: 3-1 Review of Analyze Phase

35© 2001 ConceptFlow

What is ANOVA?

• Hypothesis Test for MEANS• Uses two components of variance

• within variance (no change)• between variance (after a change)

• Uses the F-distribution to test the variance components• Comprehensive test for significance • Backbone test statistic for subsequent complex analysis

Page 36: 3-1 Review of Analyze Phase

36© 2001 ConceptFlow

When to Use ANOVA

Variables Road Map

1 Sample t-test

1 Sample

2 Samplet -test

PairedComparisons

Tukey'sQuick Test

2 Samples

ANOVA

2 or more samples

Variables Data

1 Mean 2 Means 2+ Means

ANOVA is used to test two or more means

Page 37: 3-1 Review of Analyze Phase

37© 2001 ConceptFlow

Working With the ANOVA Data

• ANOVA data analysis will determine• Total process variance• Within factor variance

• Variation due to noise• Technology focus

• Between factor variance• Variation due to factor change• Process focus

Page 38: 3-1 Review of Analyze Phase

38© 2001 ConceptFlow

ANOVA Objectives

By the end of this module, the participant should be able to: • Explain how ANOVA works • Interpret an ANOVA table• Determine significant effects • Perform a residual analysis• Determine if data is normal• Test groups of data for equal variances• Run main effects plots

Page 39: 3-1 Review of Analyze Phase

39© 2001 ConceptFlow

What is a Variation Component Study?

• A variation component study combines techniques from familiar areas:• Shewart control chart model

• Rational sub-grouping• Measurement systems analysis• Graphical, Multi-Variate charts • Analysis of variance (ANOVA) methods

• Type of study partitions potential sources of variation within a process so the researcher will know where to work first

Page 40: 3-1 Review of Analyze Phase

40© 2001 ConceptFlow

Crossed Versus Nested Studies

Subject 1 Subject 2 Subject 3

Group 1

Subject 1 Subject 2 Subject 3

Group 2 ...

Subject 1 Subject 2 Subject 3

Group k

Subject 1 Subject 2 Subject 3

Group 1

Subject 4 Subject 5 Subject 6

Group 2 ...

Subject 16 Subject 17 Subject 18

Group k

Crossed Study: Subjects are not unique to one group

Nested Study: Subjects are unique to one group

Page 41: 3-1 Review of Analyze Phase

41© 2001 ConceptFlow

Variation Component Studies Objectives

By the end of this module participant should be able to:• Design appropriate sampling plans for variation component studies• Recognize whether data is crossed, nested or both and model the

scenarios using ANOVA• Analyze studies

• Graphically• With control charts• Using ANOVA methods

• Provide estimates of variation components (quantify)• Provide guidance/direction for process improvement

Page 42: 3-1 Review of Analyze Phase

42© 2001 ConceptFlow

Correlation Coefficient

302010

100

90

80

70

60

50

40

X

Y

r = -1.0302010

90

80

70

60

50

40

30

20

X

Y

r = +1.0

302010

76

75

74

73

72

71

X

Y

r = 0.0

No correlation

Page 43: 3-1 Review of Analyze Phase

43© 2001 ConceptFlow

Correlation and Regression

• Correlation tells how much linear association exists between two variables

• Regression provides an equation describing the nature of relationship

Correlations: Shelf Space, Sales

Pearson correlation of Shelf Space and Sales = 0.978

p-value = 0.000

Regression Analysis: Sales versus Shelf Space

The regression equation is Sales = - 4711 + 10.1 Shelf Space

Page 44: 3-1 Review of Analyze Phase

44© 2001 ConceptFlow

Types of Regression

• Simple Linear Regression

• Single regressor (x) variable such as x1 and model linear with respect to coefficients

• Multiple Linear Regression

• Multiple regressor (x) variables such as x1, x2, x3 and model linear with respect to coefficients

• Simple Non-Linear Regression • Single regressor (x) variable such as x and model non-linear with

respect to coefficients• Multiple Non-Linear Regression

• Multiple regressor (x) variables such as x1, x2, x3 and model non-linear with respect to coefficients

Page 45: 3-1 Review of Analyze Phase

45© 2001 ConceptFlow

Method of Least Squares

Objective:

• Find a line that will minimize sum of squares of residuals

650600550

2000

1500

1000

Shelf Space

Sal

es

Regression Plot

Ŷ

Regression Line

Residual = Y - Ŷ ̂

Residuals are the error of prediction

Y

Page 46: 3-1 Review of Analyze Phase

46© 2001 ConceptFlow

Correlation and Simple Regression Objectives

By the end of this module the participant should be able to:• Measure the strength of correlation between two variables• Determine if a correlation coefficient is statistically significant• Perform simple linear regression including polynomial regression• Perform model diagnostics and validate assumptions• Use a regression model to predict the value of a response variable for

a given value of predictor

Page 47: 3-1 Review of Analyze Phase

47© 2001 ConceptFlow

What is Multiple Regression?

• Procedure of establishing relationship between a continuous type response variable and two or more independent variables

• Multiple regression equation can be used to predict a response based on values of predictor variables

• Multiple regression equation takes the form

Y = f (x1, x2, x3, ….)

Page 48: 3-1 Review of Analyze Phase

48© 2001 ConceptFlow

Types of Multiple Regression

• Multiple Linear Regression

• Multiple regressor (x) variables such as x1, x2, x3 and model linear with respect to coefficients

• Multiple Non-Linear Regression

• Multiple regressor (x) variables such as x1, x2, x3 and model non-linear with respect to coefficients

This module focuses on multiple linear regression applying general least squares method

Page 49: 3-1 Review of Analyze Phase

49© 2001 ConceptFlow

Predictor Variable Selection

• What combination of predictor variables is best for the regression model?

• Three options in MINITAB™:• Stepwise: procedure to add and remove variables to the regression

model to produce a useful subset of predictors• Best Subsets: procedure to give best fitting regression model that

can be constructed with one variable, two variable, three variable, etc. models

• Regression: once the best model is selected, use Regression to get more detailed diagnostics

Page 50: 3-1 Review of Analyze Phase

50© 2001 ConceptFlow

Multiple Regression Objectives

By the end of this module participant should be able to:• Determine, for a given response variable, the key process input

variables from a set of multiple input variables• Perform multiple linear regression for a given set of response variables

using several input variables• Perform model diagnostics and validate assumptions• Use a regression model to predict the value of a response variable for

given values of predictor variables

Page 51: 3-1 Review of Analyze Phase

51© 2001 ConceptFlow

Analyze Phase Deliverables

• Week 1 Deliverables summarized and updated

• Revised problem statement reflecting an increased understanding of the problem

• Detailed Process Map revised

• Additional sources of variation quantified and prioritized

• Use and display data to identify and verify the “vital few” factors

• Sampling plan

• Graphical analysis and interpretation of data

• Correlation and Regression Analysis

• Confidence interval for Y metric(s)

• Hypothesis statement(s), null hypothesis and alternative hypothesis

• MINITAB hypothesis test output, p value and interpretation

• Project management report (Gantt chart, timelines, milestones, critical path)

• Any red flags with project or project scope and recommendations to resolve

• Next steps

• Signed approval of report out by Project Champion

Prepare and deliver a 10 minute presentation that discusses the following project status items:

Page 52: 3-1 Review of Analyze Phase

52© 2001 ConceptFlow

Appendix

Page 53: 3-1 Review of Analyze Phase

53© 2001 ConceptFlow

3 or more Levels

Non-Parametric Tests

Binominal (Dichotomous)

Mann-Whitney U

(T-test analog)

Friedman Two way

ANOVA (Repeated measure ANOVA)

Dependent

Kruskal-Wallis H (One

way ANOVA analog)

Wilcoxon Sign (Paired

t-test analog)

Independent Dependent Independent

Non-Parametric Hypothesis Testing Roadmap

Page 54: 3-1 Review of Analyze Phase

Trademarks and Service Marks

Six Sigma is a federally registered trademark of Motorola, Inc.

Breakthrough Strategy is a federally registered trademark of Six Sigma Academy.

VISION. FOR A MORE PERFECT WORLD is a federally registered trademark of Six Sigma Academy.

ESSENTEQ is a trademark of Six Sigma Academy.

FASTART is a trademark of Six Sigma Academy.

Breakthrough Design is a trademark of Six Sigma Academy.

Breakthrough Lean is a trademark of Six Sigma Academy.

Design with the Power of Six Sigma is a trademark of Six Sigma Academy.

Legal Lean is a trademark of Six Sigma Academy.

SSA Navigator is a trademark of Six Sigma Academy.

SigmaCALC is a trademark of Six Sigma Academy.

iGrafx is a trademark of Micrografx, Inc.

SigmaTRAC is a trademark of DuPont.

MINITAB is a trademark of Minitab, Inc.