lecturer: prof. duane s. boning - dspace@mit home · 2019. 9. 13. · 2 agenda 1. review:...

38
1 Statistical Inference Lecturer: Prof. Duane S. Boning

Upload: others

Post on 25-Feb-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

1

Statistical Inference

Lecturer: Prof. Duane S. Boning

Page 2: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

2

Agenda

1. Review: Probability Distributions & Random Variables

2. Sampling: Key distributions arising in sampling

• Chi-square, t, and F distributions

3. Estimation:

Reasoning about the population based on a sample

4. Some basic confidence intervals

• Estimate of mean with variance known

• Estimate of mean with variance not known

• Estimate of variance

5. Hypothesis tests

Page 3: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

3

Discrete Distribution: Bernoulli

• Bernoulli trial: an experiment with two outcomes

• Probability mass function (pmf):

f(x)

x

¾ p

1 - p¼

0 1

Page 4: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

4

Discrete Distribution: Binomial

• Repeated random Bernoulli trials

• n is the number of trials

• p is the probability of “success” on any one trial

• x is the number of successes in n trials

Page 5: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

5

Binomial Distribution

Binomial Distribution

0

0.05

0.1

0.15

0.2

0.25

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Number of "Successes"

Pro

bab

ilit

y

0

0 .2

0 .4

0 .6

0 .8

1

1 .2

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Series1

Page 6: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

6

Discrete Distribution: Poisson

• Poisson is a good approximation to Binomial when n is large and p is small (< 0.1)

• Example applications:

– # misprints on page(s) of a book

– # transistors which fail on first day of operation

• Mean:

• Variance:

Page 7: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

7

Poisson Distribution

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41

Events per unit

Pro

ba

bil

ity

c

Poisson Distribution

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41

Events per unit

Pro

ba

bil

ity

c

Poisson Distribution

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41

Events per unit

Pro

ba

bil

ity

=5

=20

=30

Poisson Distributions

Page 8: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

8

Continuous Distributions

• Uniform Distribution

• Normal Distribution

– Unit (Standard) Normal Distribution

Page 9: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

9

Continuous Distribution: Uniform

xa

1

b

xa b

• cumulative distribution function* (cdf)

• probability density function (pdf)†

†also sometimes called a probability distribution function

*also sometimes called a cumulative density function

Page 10: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

10

Standard Questions You Should Be Able To Answer

(For a Known cdf or pdf)

xa

1

b

xa b• Probability x sits within

some range

• Probability x less than or

equal to some value

Page 11: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

11

Continuous Distribution: Normal (Gaussian)

0

0.16

0.5

0.84

1

0.977

0.99865

0.0227.00135

• pdf

• cdf

0

Page 12: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

12

Continuous Distribution: Unit Normal

• Normalization

• cdf

• Mean

• Variance

• pdf

Page 13: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

13

0.1

0.5

0.9

1

Using the Unit Normal pdf and cdf

• We often want to talk about “percentage points” of the distribution – portion in the tails

Page 14: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

14

Philosophy

The field of statistics is about reasoning in the face of

uncertainty, based on evidence from observed data

• Beliefs:– Distribution or model form

– Distribution/model parameters

• Evidence:– Finite set of observations or data drawn from a population

• Models:– Seek to explain data

Page 15: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

15

Moments of the Population

vs. Sample Statistics

• Mean

• Variance

• Standard

Deviation

• Covariance

• Correlation

Coefficient

Population Sample

Page 16: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

16

Sampling and Estimation

• Sampling: act of making observations from populations

• Random sampling: when each observation is identically

and independently distributed (IID)

• Statistic: a function of sample data; a value that can be

computed from data (contains no unknowns)

– average, median, standard deviation

Page 17: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

17

http://stat-www.berkeley.edu/~stark/SticiGui

Sampling Demo

SticiGui: Statistics Tools for Internet and Classroom

Instruction with a Graphical User Interface

Page 18: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

18

Population vs. Sampling Distribution

Population

(probability density function)

n = 20

Sample Mean

(statistic) n = 10

n = 2

Sample Mean

(sampling distribution)

n = 1

Page 19: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

19

Sampling and Estimation, cont.

• Sampling

• Random sampling

• Statistic

• A statistic is a random variable, which itself has a sampling distribution– I.e., if we take multiple random samples, the value for the statistic

will be different for each set of samples, but will be governed by the same sampling distribution

• If we know the appropriate sampling distribution, we can reason about the population based on the observed value of a statistic– E.g. we calculate a sample mean from a random sample; in what

range do we think the actual (population) mean really sits?

Page 20: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

20

Sampling and Estimation – An Example

• Suppose we know that the thickness of a part is normally distributed with std. dev. of 10:

• We sample n = 50 random parts and compute the mean part thickness:

• First question: What is distribution of

• Second question: can we use knowledge of distribution to reason about the actual (population) mean given observed (sample) mean?

Page 21: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

21

Estimation and Confidence Intervals

• Point Estimation:

– Find best values for parameters of a distribution

– Should be

• Unbiased: expected value of estimate should be true value

• Minimum variance: should be estimator with smallest variance

• Interval Estimation:

– Give bounds that contain actual value with a

given probability

– Must know sampling distribution!

Confidence Interval Demo

Page 22: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

22

Confidence Intervals: Variance Known

• We know , e.g. from historical data

• Estimate mean in some interval to (1- )100% confidence

0.1

0.5

0.9

1

• Remember the unit normal

percentage points

• Apply to the sampling

distribution for the

sample mean

Page 23: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

23

95% confidence interval, = 0.05

n = 50

~95% of distribution

lies within +/- 2 of mean

Example, Cont’d

• Second question: can we use knowledge of distribution to

reason about the actual (population) mean given observed

(sample) mean?

Page 24: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

24

Reasoning & Sampling Distributions

• Example shows that we need to know our sampling

distribution in order to reason about the sample and

population parameters

• Other important sampling distributions:

– Student-t

• Use instead of normal distribution when we don’t know actual

variation or

– Chi-square

• Use when we are asking about variances

– F

• Use when we are asking about ratios of variances

Page 25: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

25

Sampling: The Chi-Square Distribution

• Typical use: find distribution

of variance when mean is

known

• Ex:

So if we calculate s2, we can use

knowledge of chi-square distribution

s

0 5 10 15 20 25 300

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

to put bounds on where we believe

the actual (population) variance sit

Page 26: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

26

Sampling: The Student-t Distribution

• Typical use: Find distribution of average when is NOT known

• For k ! 1, tk ! N(0,1)

• Consider xi ~ N( , 2) . Then

• This is just the “normalized” distance from mean (normalized

to our estimate of the sample variance)

Page 27: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

27

Back to our Example

• Suppose we do not know either the variance or the mean in

our parts population:

• We take our sample of size n = 50, and calculate

• Best estimate of population mean and variance (std.dev.)?

• If had to pick a range where would be 95% of time?

Have to use the appropriate sampling distribution:

In this case – the t-distribution (rather than normal

distribution)

Page 28: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

28

Confidence Intervals: Variance Unknown

• Case where we don’t know variance a priori

• Now we have to estimate not only the mean based on

our data, but also estimate the variance

• Our estimate of the mean to some interval with

(1- )100% confidence becomes

Note that the t distribution is slightly wider than the normal

distribution, so that our confidence interval on the true mean is

not as tight as when we know the variance.

Page 29: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

29

Example, Cont’d

• Third question: can we use knowledge of distribution to

reason about the actual (population) mean given observed

(sample) mean – even though we weren’t told ?

n = 50

95% confidence intervalt distribution is

slightly wider than

gaussian distribution

Page 30: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

30

Once More to Our Example

• Fourth question: how about a confidence interval on our

estimate of the variance of the thickness of our parts, based on

our 50 observations?

Page 31: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

31

Confidence Intervals: Estimate of Variance

• The appropriate sampling distribution is the Chi-square

• Because 2 is asymmetric, c.i. bounds not symmetric.

Page 32: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

32

0 10 20 30 40 50 60 70 80 90 1000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

Example, Cont’d

• Fourth question: for our example (where we observed

sT2 = 102.3) with n = 50 samples, what is the 95%

confidence interval for the population variance?

Page 33: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

33

Sampling: The F Distribution

• Typical use: compare the spread of two populations

• Example:

– x ~ N( x, 2

x) from which we sample x1, x2, …, xn

– y ~ N( y, 2

y) from which we sample y1, y2, …, ym

– Then

or

Page 34: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

34

Concept of the F Distribution

• Assume we have a normally

distributed population

• We generate two different

random samples from the

population

• In each case, we calculate a

sample variance si2

• What range will the ratio of

these two variances take?

) F distribution

• Purely by chance (due to

sampling) we get a range of

ratios even though drawing

from same population

Example:

• Assume x ~ N(0,1)

• Take samples of size n = 20

• Calculate s12 and s2

2 and take ratio

• 95% confidence interval on ratio

Large range in ratio!

Page 35: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

35

Hypothesis Testing

• A statistical hypothesis is a statement about the parameters of a probability distribution

• H0 is the “null hypothesis”

– E.g.

– Would indicate that the machine is working correctly

• H1 is the “alternative hypothesis”

– E.g.

– Indicates an undesirable change (mean shift) in the machine operation (perhaps a worn tool)

• In general, we formulate our hypothesis, generate a random sample, compute a statistic, and then seek to reject H0 or fail to reject (accept) H0 based on probabilities associated with the statistic and level of confidence we select

Page 36: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

36

Which Population is Sample x From?

• Two error probabilities in decision:

– Type I error: “false alarm”

– Type II error: “miss”

– Power of test (“correct alarm”)

Consider H1 an

“alarm” condition

Set decision point (and

sample size) based on

acceptable ,

Consider H0 the

“normal” condition

risks• Control charts are hypothesis tests:

– Is my process “in control” or has a significant change occurred?

Page 37: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

37

Summary1. Review: Probability Distributions & Random Variables

2. Sampling: Key distributions arising in sampling• Chi-square, t, and F distributions

3. Estimation: Reasoning about the population based on a sample

4. Some basic confidence intervals• Estimate of mean with variance known

• Estimate of mean with variance not known

• Estimate of variance

5. Hypothesis tests

Next Time:

1. Are effects (one or more variables) significant?) ANOVA (Analysis of Variance)

2. How do we model the effect of some variable(s)?) Regression modeling

Page 38: Lecturer: Prof. Duane S. Boning - DSpace@MIT Home · 2019. 9. 13. · 2 Agenda 1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling

MIT OpenCourseWarehttp://ocw.mit.edu

2.854 / 2.853 Introduction to Manufacturing Systems

Fall 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.