7 statistical(intervals - university of colorado boulder

7 Statistical Intervals

Chapter 7 Stat 4570/5570

Material from Devore’s book (Ed 8), and Cengage

2

Confidence IntervalsThe CLT tells us that as the sample size n increases, the sample mean X is close to normally distributed with expected value µ and standard deviation

Standardizing X by first subtracting its expected valueand then dividing by its standard deviation yields thestandard normal variable

How big does our sample need to be if the underlying population is normally distributed?

3

Basic Properties of Confidence Intervals

Because the area under the standard normal curve between –1.96 and 1.96 is .95, we know:

This is equivalent to:

which can be interpreted as the probability that the interval

includes the true mean µ is 95%.

4


The interval

is thus called the 95% confidence interval for the mean.

This interval varies from sample to sample, as the sample mean varies.

So the interval itself is a random interval.

5


The CI interval is centered at the sample mean X and extends 1.96 to each side of X.

The interval’s width is 2 (1.96) , which is not random;; only the location of the interval (its midpoint X) is random.

6


For a given sample, the CI can be expressed either as

or as

A concise expression for the interval is

where the left endpoint is the lower limit and the right endpoint is the upper limit.

x± 1.96 · /pn

7

Interpreting a Confidence LevelWe started with an event (that the random interval captures the true value of µ) whose probability was .95

It is tempting to say that µ lies within this fixed interval with probability 0.95.

µ is a constant (unfortunately unknown to us). It is therefore incorrect to write the statement

P(µ lies in (a, b)) = 0.95

-- since µ either is in (a,b) or isn’t.

Basically, µ is not random (it’s a constant), so it can’t have a probability associated with its behavior.

8

Interpreting a Confidence Level• Instead, a correct interpretation of “95% confidence” relies on the long-run relative frequency interpretation of probability.

• To say that an event A has probability .95 is to say that if the same experiment is performed over and over again, in the long run A will occur 95% of the time.

• So the right interpretation is to say that in repeated sampling, 95% of the confidence intervals obtained from all samples will actually contain µ . The other 5% of the intervals will not.

• The confidence level is not a statement about any particular interval instead it pertains to what would happen if a very large number of like intervals were to be constructed using the same CI formula.

9

Other Levels of ConfidenceProbability of 1 – α is achieved by using zα/2 in place of z.025 = 1.96

P (z↵/2 Z z↵/2) = 1 ↵ where Z =X µ

/pn

10

Other Levels of ConfidenceA 100(1 – α)% confidence interval for the mean µ when the value of σ is known is given by

or, equivalently, by

The formula for the CI can also be expressed in words as Point estimate ± (z critical value) (standard error).

11

Example A sample of 40 units is selected and diameter measured for each one. The sample mean diameter is 5.426 mm, and the standard deviation of measurements is 0.1mm.

Let’s calculate a confidence interval for true average hole diameter using a confidence level of 90%. What is the width of the interval?

What about the 99% confidence interval?

What are the advantages and disadvantages to a wider confidence interval?

12

Sample size computation For each desired confidence level and interval width, we can determine the necessary sample size.

Example: A response time is Normally distributed with standard deviation 25 milliseconds. A new system has been installed, and we wish to estimate the true average response time µ for the new environment.

Assuming that response times are still normally distributed with σ = 25, what sample size is necessary to ensure that the resulting 95% CI has a width of (at most) 10?

13

Unknown mean and varianceWe now know that a CI for the mean µ of a normal distribution and a large-sample CI for µ for any distribution with a confidence level of 100(1 – α)% is:

A practical difficulty is the value of σ, which will rarely be known. Instead we work with the standardized random variable

Where the sample standard deviation s has replaced σ.

14

Unknown mean and variancePreviously, there was randomness only in the numerator of Z by virtue of , the estimator.

In the new standardized variable, both and S vary in value from one sample to another.

Thus the distribution of this new variable should be wider than the Normal to reflect the extra uncertainty. This is indeed true when n is small.

However, for large n the subsititution of S for σ adds little extra variability, so this variable also has approximately a standard normal distribution.

15

A Large-Sample Interval for µPropositionIf n is sufficiently large (n>40), the standardized random variable

has approximately a standard normal distribution. This implies that

is a large-sample confidence interval for µ with confidence level approximately 100(1 – α)%.

(This formula is valid regardless of the population distribution for sufficiently large n.)

16

A Large-Sample Interval for µGenerally speaking, n > 40 will be sufficient to justify the use of this interval.

This is somewhat more conservative than the rule of thumb for the CLT because of the additional variability introduced by using S in place of σ.

17

Small sample intervals for the mean

•The CI for µ presented in earlier section is valid provided that n is large• Rule of thumb: n > 40• The resulting interval can be used whatever the nature of the population distribution.

• The CLT cannot be invoked, however, when n is small• Need to do something else when n < 40

• When n < 40 and the underlying distribution is unknown, we have to • make a specific assumption about the form of the population distribution

• derive a CI based on that assumption. • For example: develop a CI for µ when the population is described by a gamma distribution, another interval for the case of a Weibulldistribution, and so on.

18

t DistributionThe results on which large sample inferences are based introduces a new family of probability distributions called t distributions.

When is the mean of a random sample of size n from a normal distribution with mean µ, the random variable

has a probability distribution called a t Distribution with n–1 degrees of freedom (df).

19

Properties of t DistributionsFigure below illustrates some members of the t-family

20

Properties of t DistributionsProperties of t DistributionsLet tν denote the t distribution with ν df.

1. Each tν curve is bell-shaped and centered at 0.

2. Each tν curve is more spread out than the standard normal (z) curve.

3. As ν increases, the spread of the corresponding tν curve decreases.

4. As ν → , the sequence of tν curves approaches the standard normal curve (so the z curve is the t curve withdf = ).

21

Properties of t DistributionsLet tα,ν = the number on the measurement axis for which the area under the t curve with ν df to the right of tα,ν is α;;tα,ν is called a t critical value.

For example, t.05,6 is the t critical value that captures an upper-tail area of .05 under the t curve with 6 df

22

Tables of t DistributionsThe probabilities of t curves are found in a similar way as the normal curve.

Example: obtain t.05,15

23

The One-Sample t Confidence Interval

Let and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with mean µ. Then a 100(1 – α)% t-confidence interval for the mean µ is

or, more compactly

So when the true variance is not known and the sample size is small (n ≤ 30) and the underlying population is believed to be normally distributed, then we use the t-distribution CI.

24

Example A dataset on the modulus of material rupture (psi):

6807.99 7637.06 6663.28 6165.03 6991.41 6992.236981.46 7569.75 7437.88 6872.39 7663.18 6032.286906.04 6617.17 6984.12 7093.71 7659.50 7378.617295.54 6702.76 7440.17 8053.26 8284.75 7347.957422.69 7886.87 6316.67 7713.65 7503.33 7674.99

There are 30 observations. The sample mean is 7203.191 The sample standard deviation is 543.5400.

cont’d

25

Comparison of t vs z in R

26

Intervals Based on Nonnormal Population Distributions

The one-sample t CI for µ is robust to small or even moderate departures from normality unless n is very small.

By this we mean that if a critical value for 95% confidence, for example, is used in calculating the interval, the actual confidence level will be reasonably close to the nominal 95% level.

If, however, n is small and the population distribution is non-normal, then the actual confidence level may be considerably different from the one you think you are using when you obtain a particular critical value from the t table.

27

Summary of Confidence Intervals so far…

• If population variance σ 2 is known and the underlying population is known (or assumed to be) normally distributedor if n > 30 then we use a confidence interval based on the normal dist. (i.e. z-scores).

• If population variance σ 2 is unknown and if n > 40 then we use a CI based on z-scores with σ ≈ s (there is no assumption made here about the distribution).

• If population variance σ 2 is unknown and the underlying population is known (or assumed to be) normally distributedand if n ≤ 30 then we use a CI based on the t-dist. with σ ≈ s and if n > 30 use a CI based on z-scores with σ ≈ s.

28

A Confidence Interval for a Population Proportion

Let p denote the proportion of “successes” in a population, where success identifies an individual or object that has a specified property (e.g., individuals who graduated from college, computers that do not need warranty service, etc.).

A random sample of n individuals is to be selected, and X is the number of successes in the sample. X can be thought of as a sum of all Xi’s, where 1 is added for every success that occurs and a 0 for every failure, so X1 + . . . + Xn = X).

Thus, X can be regarded as a Binomial rv with mean np and .

Furthermore, if both np ≥ 10 and n(1-p) ≥ 10, X has approximately a normal distribution.

29

A Confidence Interval for a Population Proportion

The natural estimator of p is = X / n, fraction of successes. Since is the sample mean, (X1 + . . . + Xn)/ n has approximately a normal distribution. As we know

that, E( ) = p (unbiasedness) and .

The standard deviation involves the unknown parameter p. Standardizing by subtracting p and dividing by then implies that

And the CI is

30

Confidence Intervals for the Variance of a Normal Population

Let X1, X2, … , Xn be a random sample from a normal distribution with parameters µ and σ2. Then the r.v.

has a chi-squared ( ) probability distribution with n – 1 df.

We know that the chi-squared distribution is a continuous probability distribution with a single parameter v, called the number of degrees of freedom, with possible values 1, 2, 3, . . . .

2

31


Let X1, X2, … , Xn be a random sample from a normal distribution with parameters µ and σ2. Then

has a chi-squared ( 2) probability distribution with n – 1 df.

We know that the chi-squared distribution is a continuous probability distribution with a single parameter v, called the number of degrees of freedom, with possible values 1, 2, 3, . . . .

32


The graphs of several Chi-square probability density functions are

33


The chi-squared distribution is not symmetric, so Appendix Table A.7 contains values of both for α near 0 and 1

34


As a consequence

Or equivalently

Thus we have a confidence interval for the variance σ 2 .

Taking square roots gives a CI for the standard deviation σ.

35


A 100(1 – α)% confidence interval for the variance σ2 of a normal population has lower limit

and upper limit

A confidence interval for σ has lower and upper limits that are the square roots of the corresponding limits in the interval for σ2.

36

ExampleThe data on breakdown voltage of electrically stressed circuits are:

breakdown voltage is approximately normally distributed.

s2 = 137,324.3 n = 17

7 statistical(intervals - university of colorado boulder

Documents