mm207 statistics welcome to the unit 6 seminar wednesday, march 7, 2012 8 to 9 pm et

28
MM207 Statistics MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Upload: aldous-harvey

Post on 29-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

MM207 StatisticsMM207 StatisticsMM207 StatisticsMM207 Statistics

Welcome to the Unit 6 SeminarWednesday, March 7, 2012

8 to 9 PM ET

Page 2: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

The Normal ShapeThe Normal Shape

• This is a histogram for a distribution of 300 natural births. The left vertical axis shows the number of births for each 4-day bin*. The right vertical axis shows relative frequencies* A bin is a group or

class.

Page 3: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

The Normal ShapeThe Normal ShapeThe distribution of the birth data has a fairly distinctive shape, which is easier to see if we overlay the histogram with a smooth curve.

Page 4: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Characteristics of the Normal CurveCharacteristics of the Normal Curve

• The distribution is single-peaked. Its mode, or most common birth date, is the due date.

• The distribution is symmetric around its single peak; therefore, its median and mean are the same as its mode. The median is the due date because equal numbers of births occur before and after this date. The mean is also the due date because, for every birth before the due date, there is a birth the same number of days after the due date.

• The distribution is spread out in a way that makes it resemble the shape of a bell, so we call it a “bell-shaped” distribution.

• The total area under the curve is equal to 1.00

• The curve approaches the horizontal axis but never touches it

Page 5: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Variation in DistributionsVariation in Distributions

Both distributions are normal and have the same mean of 75, but the distribution on the left has a larger standard deviation.

Page 6: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

When Can We Expect a Normal Distribution?

When Can We Expect a Normal Distribution?

A data set that satisfies the following four criteria is likely to be normally distributed

1. Most data values are clustered near the mean, giving the distribution a well-defined single peak.

2. Data values are spread evenly around the mean, making the distribution symmetric.

3. Larger deviations from the mean become increasingly rare, producing the tapering tails of the distribution.

4. Individual data values result from a combination of many different factors, such as genetic and environmental factors.

Page 7: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

An Example of a Normal DistributionAn Example of a Normal Distribution

Consider a Consumer Reports survey in which participants were asked how long they owned their last TV set before they replaced it. The variable of interest in this survey is replacement time for television sets.

•Based on the survey, the distribution of replacement times has a mean of about 8.2 years, which we denote as µ (the Greek letter mu).

•The standard deviation of the distribution is about 1.1 years, which we denote as σ (the Greek letter sigma).

Page 8: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Television Replacement DistributionTelevision Replacement Distribution

Making the reasonable assumption that the distribution of TV replacement times is approximately normal, we can picture it as shown

“mu” = µ = 8.2“sigma”= σ = 1.1

Page 9: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

68-95-99.7 Rule or Empirical Rule68-95-99.7 Rule or Empirical Rule

This rule gives guidelines for the percentage of data values that will lie within 1, 2, and 3 standard deviations of the mean for any normal distribution.

That is from 7.1 years to 9.3 years

That is from 6 years to 10.4 years

That is from 4.9 years to 11.5 years

“mu” = µ = 8.2“sigma”= σ = 1.1

Page 10: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Finding a PercentileFinding a PercentileOn a visit to the doctor’s office, your fourth-grade daughter is told that her height is 1 standard deviation above the mean for her age and sex. What is her percentile for height? Assume that heights of fourth-grade girls are normally distributed.

•Recall that a data value lies in the nth percentile of a distribution if n% of the data values are less than or equal to it (see Section 4.3).

•According to the 68-95-99.7 rule, 68% of the heights are within 1 standard deviation of the mean.

•Therefore, 34% of the heights (half of 68%) are between 0 and 1 standard deviation above the mean.

•We also know that, because the distribution is symmetric, 50% of all heights are below the mean.

•Therefore, 50% + 34% = 84% of all heights are less than 1 standard deviation above the mean (Figure 5.21). Your daughter is in the 84th percentile for heights among fourth-grade girls

Page 11: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Finding a Percentile Finding a Percentile Interpretation: Find the percentile for 1 standard deviation above the mean for her age and sex. Assume that heights of fourth-grade girls are normally distributed.

What is her percentile if she were 1standard deviation BELOW the mean?

Page 12: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Introduction to Standard ScoresIntroduction to Standard Scores

• Remember the Empirical Rule!!!

• Sample Curve• μ = 500• σ = 100

• How many Standard Deviations away from the mean is:• 300• 800• 250• 500• 650

-3 -2 -1 0 +1 +2 +3

Page 13: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Computing Standard ScoresComputing Standard Scores

• The number of standard deviations a data value lies above or below the mean is called its standard score (or z-score), defined by

z = standard score =

= (x – µ) / σ• The standard score is positive for data values

above the mean and negative for data values below the mean.

data value – meanstandard deviation

Page 14: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Getting More PreciseGetting More Precise

Page 15: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Standard Scores and Percentiles Standard Scores and Percentiles

Once we know the standard score of a data value, the properties of the normal distribution allow us to find its percentile in the distribution. This is usually done with a standard score table. (In eText see “chapter BM” for Back matter to get Appendix A on pages 446-447)

Page 16: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Example 1Example 1

Page 17: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Example 2Example 2A college admissions test is scaled so

that scores have a mean of 500 and a standard deviation of 100.(You will use StatCrunch, but you must understand theory.)

Page 18: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Finding Z Scores from Percentiles i.e. (working backwards)Finding Z Scores from Percentiles i.e. (working backwards)

Example: Given the mean cholesterol level of 178 and the standard deviation of 41, What cholesterol level corresponds to the 90th percentile?

The 90th percentile would be on the POSITIVE Z table since it is larger than the 50th percentile. Right? Go to that table and SCAN the body looking for the value closest to .9000 (the 90th percentile). Move your fingers back to the left to get the x.y part of the Z xcore. Move you finger up to see the .0w part of the score. Now add these values to make the score x.yz. All z scores have 2 digits to the right of the decimal.

So moving to left we get 1.2; moving up we see .08; add these gives 1.2 + .08 = 1.28 as our z score.

Page 19: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Finding Z Scores from Percentiles i.e. (working backwards)Finding Z Scores from Percentiles i.e. (working backwards)

Example: Given the mean cholesterol level of 178 and the standard deviation of 41, What cholesterol level corresponds to the 90th percentile?

Now z = 1.28. Thus, the 90th percentile is about 1.28 standard deviationsabove the mean.

Finally, give this z score in terms of the problem application or the x value. Use the formula z = (x – µ) / σ and solve for x. You can do the algebra or just trust me that is x = µ + (z)* σ For our problem,

178 + (1.28 * 41) = 230.48

Therefore, A cholesterol level of about 230.48 or 230 corresponds to the 90th percentile.

Page 20: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

The Central Limit TheoremThe Central Limit Theorem

Suppose we take many random samples of size n for a variable with any distribution (not necessarily a normal distribution) and record the distribution of the means of each sample. Then,

1. The distribution of means will be approximately a normal distribution for large sample sizes. n>30 is magic number

2. The mean of the distribution of means approaches the population mean, µ, for large sample sizes.

3. The standard deviation of the distribution of means approaches σ/√n for large sample sizes, where σ is the standard deviation of the population.

Page 21: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

The Interpretation of the Central Limit TheoremThe Interpretation of the Central Limit Theorem

If you have a group of size n, instead of one individual selection (like the problems we did earlier) the only difference in working the problem is how you COMPUTE the Z Score.

Use the formula z = (given sample mean – µ) / [σ/√n ]

Also, see Example 1 of Section 5.3 called Predicting Test Score. Be sure to notice the difference in part a and part b. In part a, you have ONE person and in part b you have a group of 100 people.

Page 22: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

CLT Demonstrated (Figure 5.26)CLT Demonstrated (Figure 5.26)

Page 23: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

The Value of the Central Limit TheoremThe Value of the Central Limit Theorem

• The Central Limit Theorem allows us to say something about the mean of a group if we know the mean, µ, and the standard deviation, σ, of the entire population. This can be useful, but it turns out that the opposite application is far more important.

• Two major activities of statistics are making estimates of population means and testing claims about population means. Is it possible to make a good estimate of the population mean knowing only the mean of a much smaller sample?

• As you can probably guess, being able to answer this type of question lies at the heart of statistical sampling, especially in polls and surveys. The Central Limit Theorem provides the key to answering such questions.

Page 24: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Computing Probabilities in MSL EASY!Example 1Computing Probabilities in MSL EASY!Example 1

Note: the icon here isnot data for the problembut a standard scores for the specific distributiongiven in this problem.

Page 25: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Example 1-Part a: find the percentage of scores greater than 1866.Example 1-Part a: find the percentage of scores greater than 1866.

Choose Calculator -> Normal then put in the mean, st dev and value inquestion, 1866. Be sure to choose => for “greater than or equal”. Click Compute to get the graph and answer shown in the second picture below.

The answer is .15865 which as a percentage is 15.865 and rounds to15.87% with two decimals as asked for in the question.

Page 26: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Example 1 - Part c: find the percentage of scores between 1389 and 2184.Example 1 - Part c: find the percentage of scores between 1389 and 2184.There are 3 steps to this one. To get area between you must subtract theArea of the LEFTMOST value FROM the area of the RIGHTMOST value.Compute percentage less than 2184; compute percentage of 1389; thenSubtract. From below you see .97724986 - .30853754 = .66871232Which is 66.871% and rounded to 2 places is 66.87%.

Page 27: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Example 2 - Part c: find the probability the mean blood pressure is less than 111 for a sample of 280 women.

Example 2 - Part c: find the probability the mean blood pressure is less than 111 for a sample of 280 women.

Since n > 1, this is Central Limit Theorem. Be sure to compute newStandard deviation as sigma / sqrt(n) before plugging into StatCrunch.Standard deviation = 13.22 / sqrt(280) = 13.22 / 16.73 = .79 ~.8

Page 28: MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, 2012 8 to 9 PM ET

Questions?Questions?