2.830j / 6.780j / esd.63j control of manufacturing ... · sampling and estimation, cont. •a...

49
MIT OpenCourseWare ____________ http://ocw.mit.edu 2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008 For information about citing these materials or our Terms of Use, visit: ________________ http://ocw.mit.edu/terms.

Upload: others

Post on 19-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

MIT OpenCourseWare ____________http://ocw.mit.edu

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303)Spring 2008

For information about citing these materials or our Terms of Use, visit: ________________http://ocw.mit.edu/terms.

Page 2: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

1Manufacturing

Control of Manufacturing Processes

Subject 2.830/6.780/ESD.63Spring 2008Lecture #6

Sampling Distributions andStatistical Hypotheses

February 26, 2008

Page 3: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

2Manufacturing

Statistics

The field of statistics is about reasoning in the face of uncertainty, based on evidence from

observed data

• Beliefs:– Probability Distribution or Probabilistic model form– Distribution/model parameters

• Evidence:– Finite set of observations or data drawn from a

population (experimental measurements/observations)

• Models:– Seek to explain data wrt a model of their probability

Page 4: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

3Manufacturing

Topics• Sampling Distributions (χ2 and Student’s-t)

– Uncertainty of Parameter Estimates– Effect of Sample Size– Examples of Inference

• Inferences from Distributions– Statistical Hypothesis Testing– Confidence Intervals

• Hypothesis Testing

• The Shewhart Hypothesis and Basic SPC– Test statistics - xbar and S

Page 5: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

4Manufacturing

Sampling to Determine Parent Probability Distribution

• Assume Process Under Study has a Parent Distribution p(x)• Take “n” Samples From the Process Output (xi)

• Look at Sample Statistics (e.g. sample mean and sample variance)

• Relationship to Parent• Both are Random Variables• Both Have Their Own Probability Distributions

• Inferences about Process via Inferences about the Parent Distribution

Page 6: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

5Manufacturing

Moments of the Population vs. Sample Statistics

• Mean

• Variance

• StandardDeviation

• Covariance

• CorrelationCoefficient

Underlying model or Sample StatisticsPopulation Probability

Page 7: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

6Manufacturing

Sampling and Estimation

• Sampling: act of making observations from populations

• Random sampling: when each observation is identically and independently distributed (IID)

• Statistic: a function of sample data; a value that can be computed from data (contains no unknowns) – Average, median, standard deviation– Statistics are by definition also random variables

Page 8: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

7Manufacturing

Population vs. Sampling Distribution

Population(“true”)probability density function)

Sample Mean(statistic)

n = 20

n = 10

n = 2

Sample Mean Distribution(sampling distribution)

n = 1

Page 9: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

8Manufacturing

Sampling and Estimation, cont.

• A statistic is a random variable, which itself has a sampling (probability) distribution– I.e., if we take multiple random samples, the value for the

statistic will be different for each set of samples, but will be governed by the same sampling distribution

• If we know the appropriate sampling distribution, we can reason about the underlying population based on the observed value of a statistic– e.g. we calculate a sample mean from a random sample;

in what range do we think the actual (population) mean really sits?

Page 10: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

9Manufacturing

Estimation and Confidence Intervals• Point Estimation:

– Find best values for parameters of a distribution– Should be

• Unbiased: expected value of estimate should be true value• Minimum variance: should be estimator with smallest variance

• Interval Estimation: – Give bounds that contain actual value with a given

probability– Must know sampling distribution!

Page 11: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

10Manufacturing

• Suppose we know

• First question: What is distribution of the mean of T =

Sampling and Estimation – An Example

that the thickness of a part is normally distributed with std. dev. of 10:

• Second question: can we use knowledge of distribution to reason about the actual (population) mean μ given observed (sample) mean?

• We sample n = 50 random parts and compute the mean part thickness:

Page 12: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

11Manufacturing

Confidence Intervals: Variance Known

• We know σ, e.g. from historical data• Estimate mean in some interval to (1-α)100% confidence

1

0.9

0.5

0.1

Remember the unit normal percentage pointsApply to the sampling distribution for the sample mean

Page 13: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

12Manufacturing

Example, Cont’d

• Second question: can we use knowledge of distribution to reason about the actual (population) mean μ given observed (sample) mean?

95% confidence interval, α = 0.05

n = 50

~95% of distributionlies within +/- 2σ of mean

Page 14: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

13Manufacturing

Reasoning & Sampling Distributions

• Example shows that we need to know our sampling distribution in order to reason about the sample and population parameters

• Other important sampling distributions:– Student’s-t

• Use instead of normal distribution when we don’t know actual variation or σ

– Chi-square• Use when we are asking about variances

– F• Use when we are asking about ratios of variances

Page 15: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

14Manufacturing

Sampling: The Chi-Square Distribution

• Typical use: find distribution of variance when mean is known

• Ex:

So if we calculate s2, we can use knowledge of chi-square distribution to put bounds on where we believe the actual (population) variance sits

0 5 10 15 20 25 300

0.010.020.030.040.050.060.070.080.090.1

Page 16: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

15Manufacturing

Sampling: The Chi-Square Distribution

0 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

n=3

n=10n=20

Page 17: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

16Manufacturing

Sampling: The Student’s-t Distribution

• Typical use: Find distribution of average when σ is NOT known

• Consider x ~ N(μ, σ2) . Then

• This is just the “normalized” distance from mean (normalized to our estimate of the sample variance)

x

Page 18: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

17Manufacturing

Sampling: The Student-t Distribution

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

N=3

N=100

Standard Normal

N=10

Page 19: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

18Manufacturing

Back to Our Example

• Suppose we do not know either the variance or the mean in our parts population:

• We take our sample of size n = 50, and calculate

• Best estimate of population mean and variance (std.dev.)?

• If had to pick a range where μ would be 95% of time?

Have to use the appropriate sampling distribution:In this case – the t-distribution (rather than normaldistribution)

Page 20: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

19Manufacturing

Confidence Intervals: Variance Unknown

• Case where we don’t know variance a priori• Now we have to estimate not only the mean based on our data,

but also estimate the variance• Our estimate of the mean to some interval with

(1-α)100% confidence becomes

Note that the t distribution is slightly wider than the normaldistribution, so that our confidence interval on the true mean is not as tight as when we know the variance.

Page 21: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

20Manufacturing

Example, Cont’d

• Third question: can we use knowledge of distribution to reason about the actual (population) mean μ given observed (sample) mean – even though we weren’t told σ?

n = 50t distribution is 95% confidence intervalslightly wider than

gaussian distribution

Page 22: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

21Manufacturing

Once More to Our Example

• Fourth question: how about a confidence interval on our estimate of the variance of the thickness of our parts, based on our 50 observations?

Page 23: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

22Manufacturing

Confidence Intervals: Estimate of Variance

The appropriate sampling distribution is the Chi-square. Because χ2 is asymmetric, c.i. bounds not symmetric.

Page 24: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

23Manufacturing

0 10 20 30 40 50 60 70 80 90 1000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

Example, Cont’d

• Fourth question: for our example (where we observed s 2

T = 102.3) with n = 50 samples, what is the 95% confidence interval for the population variance?

Page 25: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

24Manufacturing

Sampling: The F Distribution

• Typical use: compare the spread of two populations• Example:

– x ~ N(μx, σ2x) from which we sample x1, x2, …, xn

– y ~ N(μy, σ2y) from which we sample y1, y2, …, ym

– Then

or

Page 26: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

25Manufacturing

Concept of the F Distribution

• Assume we have a normally distributed population

• We generate two different random samples from the population

• In each case, we calculate a sample variance s 2

i

• What range will the ratio of these two variances take?

F distribution• Purely by chance (due to

sampling) we get a range of ratios even though drawing from same population

Example:

• Assume x ~ N(0,1)

• Take 2 samples setss of size n = 20

• Calculate s 21 and s 2

2 and take ratio

• 95% confidence interval on ratio

Large range in ratio!

Page 27: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

26Manufacturing

Use of the F Distribution

2.060I.M. Hold Time Change

2.055

2.050

2.045

2.040

2.035

2.030

2.025

0.5 1 1.5 2 2.5

3 2.0363 2.0363 2.0374 2.0424 2.0454 2.0434 2.0454 2.0504 2.0484 2.0394 2.0414 2.0404 2.0434 2.0394 2.0564 2.0424 2.0394 2.042

IM Run data: Velocity and Hold Changes

2.060

2.055

2.050

2.045

2.040

2.035

2.030

2.025

2.020

2.015

1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4

Levels changes

Low Hold Time High Hold Time

Page 28: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

27Manufacturing

Agenda• Models for Random Processes

– Probability Distributions & Random Variables• Estimating Model Parameters with Sampling• Key distributions arising in sampling

– Chi-square, t, and F distributions• Estimation: Reasoning about the population based on a sample

• Some basic confidence intervals– Estimate of mean with variance known– Estimate of mean with variance not known– Estimate of variance

• Next: Hypothesis tests

Page 29: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

28Manufacturing

Statistical Inference and the Shewhart Hypothesis

• Statistical Hypotheses– Confidence of Predictions based on known or

estimated pdf• Relationship to Manufacturing Processes

Page 30: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

29Manufacturing

Statistical Hypothesis

• e.g. hypothesize that mean has specific value– Based on Assumed Model (Distribution)

• Accept or reject hypothesis based on data and statistical model– i.e. based on degree of acceptable uncertainty or

probability of error

Page 31: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

30Manufacturing

Hypothesis Testing

• Given the hypothesis for the statistic φ (e.g. the mean)

H 0: φ = φ0⎧HH : φ φ 0: φ = μ0 ⎫

1 ≠ 0 ⎨ ⎬⎩H1: φ ≠ μ0 ⎭

• No single sampled value will equal φ̂ φφ̂ 0

• How do we test the hypothesis given ? p ˆ – What is (φ ) ? (Sample Distribution?)

– How well do we want to test Ho?• Significance• Confidence

Page 32: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

31Manufacturing

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

μ=φ0

Region of Acceptance

area = 1-α

Region of Rejection

Region of Rejection

area = α/2

The Test

• Assume a Distribution (e.g. is Normal) p ˆ (φ )

α is the significance of the test

Page 33: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

32Manufacturing

Errors

• Ho is rejected when it is in fact true (Type I) (Significance)– p = ?

p=α for two sided and α/2 for one sided tests

• Ho is accepted when it is in fact false (Type II)– p = ?

= β … What is β ?

Page 34: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

33Manufacturing

Type II Errors

• Assume a shift in the true distribution of p ˆ (φ ) δ

• Assess the probability that we fall in the acceptance region after a shift of δ occurred

Page 35: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

34Manufacturing

Type II Errors

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

μ=φ0

Region of Acceptance

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

μ=φ0+ δ

area = β

Page 36: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

35Manufacturing

Calculating β

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

μ=φ0

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

μ=φ0+ δZ1−α/2Zα/2

Assuming a normally distributed Statistic

Normalized deviation

Page 37: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

36Manufacturing

Applications• Tests on the Mean

– Is the mean of “new” data the same as prior data (I.e from the same distribution?)

Or– Did a significant change occur?

• Variances of a Population– Is the variance of “new” data the same as prior data (i.e from

the same distribution?)• What are the “parent distributions” if we only have

sample data? – Sample distributions

Page 38: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

37Manufacturing

Example: Average Weight

• Hypothesize that average weight of a population is 68 kg and σ=3.6– H0: μ = 68– H1: μ ≠ 68– Assume an acceptance region of +- 1 kg– What is α or significance of test?

• Probability of a type I error

– What is β• Probability of type II error

Page 39: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

38Manufacturing

Significance from Interval

Take a sample of 36 weights for xbarThus σxbar = 3.6/6 = 0.6

xbar67 68 69(67 − 68)

z1 =0.6

(69 − 68)= −1.67 z2 =

0.6= +1.67

α = P(Z < −1.67) + P(Z > 1.76) = 2P(Z < −1.67) = 0.0959.5% chance of rejecting H0 even if true

Effect of increasing range?Effect of increasing n?

Page 40: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

39Manufacturing

β Error

Assume we must reject H0 if μ<66 or μ >70

67 68 69 xbar70 71

i.e. β = P(67 ≤ xbar ≤ 69) when μ = 70

β = P(−6.67 ≤ Z ≤ −2.22) = 0.0132

1.3% chance of accepting Ho when it is false

From symmetry - same result for μ = 66

Effect of increasing range?Effect of increasing n?

Page 41: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

40Manufacturing

Operating Characteristic CurveDependence of α and β

Note that the expression for β: depends on α, n and δ

10075

5040

30 2015 10

87

65

43

2

1

00

0.2

0.4

0.6

0.8

1.0

0.5 1 1.5 2 2.5 3 4 4.5 53.5

n Curves for α = 0.05

β

d = δ/σ

d

From Montgomery “Introduction to Statistical Quality Control, 4th ed. 2000

Figure by MIT OpenCourseWare.

Page 42: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

41Manufacturing

Some Typical Hypothesis

• Inference about Variance from Samples– Test Statistic?– Which Distribution to Use ?

• Inference about Mean– Knowing σ– Not knowing σ

Page 43: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

42Manufacturing

Summary

• Pick Significance Level α• Determine an acceptable β

– (1-β) is call the “power” of the test

• What is the effect of the number of samples (n)?

Page 44: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

43Manufacturing

On to Process Control

• How does all this relate to our problem?• What assumptions must we make?• What statistical tests should we use?• What are the best procedures to use in a

production environment?

Page 45: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

44Manufacturing

“In-Control”

i+1

i+2

...

Each Sample is from Same ParentTim

e

i

Page 46: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

45Manufacturing

“Not In-Control”

i

i+1

i+2

...

The Parent Distribution is NotThe Same at Each Sample

Time

Page 47: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

46Manufacturing

“Not In-Control”

...

Time

Mean Shift

Mean Shift + Variance Change

Bi-Modal

Page 48: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

47Manufacturing

Xbar and S Charts

• Shewhart:– Plot sequential average of process

• Xbar chart• Distribution?

– Plot sequential sample standard deviation• S chart• Distribution?

Page 49: 2.830J / 6.780J / ESD.63J Control of Manufacturing ... · Sampling and Estimation, cont. •A statistic is a random variable, which itself has a sampling (probability) distribution

48Manufacturing

Conclusions

• Hypothesis Testing• Use knowledge of PDFs to evaluate hypotheses• Quantify the degree of certainty (α and β)• Evaluate effect of sampling and sample size

• Shewhart Charts• Application of Statistics to Production• Plot Evolution of Sample Statistics and S• Look for Deviations from Model

x