© 2010 pearson education, inc. all rights reserved data analysis/statistics: an introduction...

57
© 2010 Pearson Education, Inc. All rights reserved Data Analysis/Statistic s: An Introduction Chapter 1 0

Upload: arline-snow

Post on 14-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

© 2010 Pearson Education, Inc.All rights reserved

Data Analysis/Statistics:An Introduction

Chapter 1010

Slide 10.3- 2 Copyright © 2010 Pearson Addison-Wesley. All rights reserved.

10-3 Measures of Central Tendency and Variation

Computing Means Understanding the Mean as a Balance Point Computing Medians Finding Modes Choosing the Most Appropriate Average Measures of Spread Box Plots

Slide 10.3- 3 Copyright © 2010 Pearson Addison-Wesley. All rights reserved.

10-3 Measures of Central Tendency and Variation (continued)

Outliers Comparing Sets of Data Variations: Mean Absolute Deviation,

Variance, and Standard Deviation Mean Absolute Variation Normal Distributions Applications of the Normal Curve Percentiles

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 4

NCTM Standard: Data Analysis

use measures of center, focusing on the median, and understand what each does and does not indicate about the data sets;

compare different representations of the same data and evaluate how well each representation shows important aspects of the data;…

In grades 3−5 all students should…

NCTM Principles and Standards, p. 400.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 5

NCTM Standard: Data Analysis

find, use, and interpret measures of center and spread, including mean and interquartile range;…

In grades 6−8 all students should…

NCTM Principles and Standards, p. 401.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 6

Two important aspects of data are its center and its spread.

Measures of Central Tendency

The mean and median are measures of central tendency that describe where data are centered.

The range, interquartile range, variance, mean absolute deviation, and standard deviation describe the spread of data and should be used with measures of central tendency.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 7

Computing Means

The number commonly used to characterize a set of data is the arithmetic mean, frequently called the average, or the mean.

The arithmetic mean of the numbers x1, x2,…, xn, denoted x and read “x bar” is given by

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 8

Understanding the Mean as aBalance Point

The mean of 5 is the balance point where the sum of the distances from the mean to the data points above the mean equals the sum of the distances from the mean to the data points below the mean.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 9

Understanding the Mean as aBalance Point

The sum of the distances above the mean is 3 + 5 or 8. The sum of the distances below the mean is 1 + 2 + 2 + 3 or 8.

The data are centered about the mean, but the mean does not belong to the set of data.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 10

Understanding the Mean as aBalance Point

It is possible to rearrange the data to have the same mean but be spread very differently.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 11

Computing Medians

The value exactly in the middle of an ordered set of numbers is the median.

To find the median for a set of n numbers,

1. Arrange the numbers in order from least to greatest.

2. a. If n is odd, the median is the middle number.

b. If n is even, the median is the mean of the two middle numbers.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 12

Computing Medians

A median is often reported with the interquartile range, a measure of spread that shows where the middle 50% of the scores lie with the median in that range. The two together form a much better pair to describe the data than the median alone.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 13

Finding Modes

The mode of a set of data is sometimes reported as a measure of central tendency, but when it is reported in that form, it is frequently being misused.

The mode of a set of data is the number that appears most frequently, if there is one, but the mode does not have to be in any way a measure of central tendency.

A mode is frequently reported with categorical data.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 14

Example 10-3

Find the (a) mean, (b) median, and (c) mode for the data:

60 60 70 95 95 100

a.

b. The data is arranged from least to greatest and there are an even number of data, so the median is

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 15

Example 10-3 (continued)

Find the (a) mean, (b) median, and (c) mode for the data:

60 60 70 95 95 100

c. The set of data is bimodal. Both 60 and 95 are modes.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 16

Choosing the Most Appropriate Average

Although the mean is the most commonly used “average” to describe a set of data, it may not always be the most appropriate choice.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 17

Example 10-4

Suppose a company employs 20 people. The president of the company earns $200,000, the vice president earns $75,000, and 18 employees earn $10,000 each. Is the mean the best number to choose to represent the “average” salary for the company?

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 18

Example 10-4 (continued)

The mean salary is

In this case, the mean salary of $22,750 is not representative. Either the median or mode, both of which are $10,000, would describe the typical salary better.

The mean is affected by extreme values. In most cases, the median is not affected by extreme values.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 19

Example 10-5

Suppose nine students make the following scores on a test:

30 35 40 40 92 92 93 98 99

Is the median the best “average” to represent the set of scores?

The median score is 92.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 20

Example 10-5 (continued)

From that score, one might infer that the students all scored very well, yet 92 is certainly not a typical score.

In this case, the mean of approximately 69 might be more appropriate than the median.

However, with the spread of the scores, neither is very appropriate for this distribution.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 21

Example 10-6

Is the mode an appropriate “average” for the following test scores?

40 42 50 62 63 65 98 98

The mode is 98.

The score of 98 is not representative of the set of data because of the large spread of scores and the much lower mean (and median).

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 22

Consider the following data:

20, 22, 22, 25, 26, 27, 27, 28, 30, 35

Measures of Spread

Range = upper extreme – lower extreme

= 35 – 20 = 15

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 23

20, 22, 22, 25, 26, 27, 27, 28, 30, 35

Measures of Spread

Median = 22 = Q1 Median = 28 = Q3

Interquartile range (IQR) = Q3 − Q1 = 28 − 22 = 6

When the interquartile range is reported along with the median, not only do we know the middle, we know how spread out the middle 50% of the data are.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 24

Box Plots

A box plot is a way to display data visually and draw informal conclusions. Box plots show only the visual representations of the five-number summary of the data: the median, the upper and lower quartiles, and the least and greatest values in the distribution.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 25

Q1 Q2 Q3

MedianMinimum data point

Maximum data point

Bottom 25% Top 25%

15 20 25 30 35 45

Min MaxQ1 Q3Q2

Box Plots

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 26

Example 10-7

What are the minimum and maximum values, the median, and the lower and upper quartiles of the box plot below?

Minimum: 0

Lower quartile: 10

Maximum: 70

Median: 20

Upper quartile: 35

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 27

Outliers

An outlier is a value that is widely separated from the rest of a group of data.

In the set of scores 91, 92, 92, 93, 93, 93, 94, all data are grouped close together and no values are widely separated.

In the set of scores 21, 92, 92, 93, 93, 93, 95, 150, both 21 and 150 are widely separated from the rest of the data. These values are potential outliers.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 28

Outliers

An outlier is any value that is more than 1.5 times the interquartile range above the upper quartile or below the lower quartile.

Outliers are commonly indicated with an asterisk. Whiskers are then drawn to the extreme points that are not outliers.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 29

Example 10-8

The table shows the final medal standings for the top 20 countries in the 2004 summer Olympics. Draw a box plot of the data and identify possible outliers.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 30

Example 10-8 (continued)

Extreme scores: 103, 12

Q1 = 18

Median: 28.5

Q3 = 42.5 IQR = 24.5

Outliers are scores greater than 42.5 + 1.5(24.5), or 79.25, or less than 18 − 1.5(24.5), or −18.75.

92 and 103 are the only outliers.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 31

Comparing Sets of Data

Box plots are used primarily for large sets of data or for comparing several distributions.

The stem and leaf plot is usually a much clearer display for a single distribution.

Parallel box plots drawn using the same number line give us the easiest comparison of medians, extreme scores, and the quartiles for the sets of data.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 32

Comparing Sets of Data

Although we cannot spot clusters or gaps in box plots as we can with stem and leaf or line plots, we can more easily compare data from different sets.

With box plots, we do not need to have sets of data that are approximately the same size, as we did for stem and leaf plots.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 33

Comparing Sets of Data

To compare data from two or more sets using their box plots, first study the boxes to see if they are located in approximately the same places.

Next, consider the lengths of the boxes to see if the variability of the data is about the same.

Also check whether the median, the quartiles, and the extreme values in one set are greater than those in another set.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 34

Comparing Sets of Data

From the box plot, we can see that the mean salaries for males have been higher than those for females, since the extreme values, median, and quartiles for the males are greater than those for females. Also, more than 50% of the mean salaries for males are greater than those for the mean salaries of females over the time period.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 35

Variation: Mean Absolute Deviation, Variance, and Standard Deviation

A measure of spread is needed when data are summarized with a single number, such as the mean or median.

The simplest way is to find the range.

We can also use the interquartile range, the IQR.

The most sophisticated is the standard deviation.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 36

The mean absolute deviation (MAD) makes use of the absolute value to find the distance each data point is away from the mean.

Mean Absolute Deviation

Then the mean of those distances is found to give an “average distance from the mean” for each of the points.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 37

1. Measure the distance from the mean by subtracting the data value minus the mean.

2. Find the absolute value of each difference.

3. Sum those absolute values (the absolute deviation).

4. Find the mean absolute deviation (MAD) by dividing the sum by the number of scores.

Mean Absolute Deviation

Compute the mean absolute deviation (MAD) of n numbers as follows:

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 38

Mean Absolute Deviation

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 39

Mean Absolute Deviation

The table shows a set of data along with the computation of the mean absolute deviation.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 40

Mean Absolute Deviation

Pictures of the mean absolute deviation for the given set of test scores are shown in the figures.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 41

The variance and the standard deviation are two commonly used statements of dispersion. These measures are also based on how far the scores are from the mean.

Variance and Standard Deviation

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 42

1. Find the mean of the numbers.

2. Subtract the mean from each number.

3. Square each difference found in Step 2.

4. Find the sum of the squares in Step 3.

5. Divide the sum in Step 4 by n to obtain the variance, v.

Variance and Standard Deviation

Compute the variance, v, of n numbers as follows:

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 43

Variance and Standard Deviation

The standard deviation, s, of n numbers is the square root of the variance, v.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 44

Example 10-9

Professor Abel gave two group exams. Exam A had grades of 0, 0, 0, 100, 100, 100, and exam B had grades of 50, 50, 50, 50, 50, 50. Find the following for each exam:

a. Mean Exam A: 50; exam B: 50

b. Range Exam A: 100; exam B: 0

c. Mean absolute deviation

Exam A: 50; exam B: 0

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 45

Example 10-9 (continued)

d. Standard deviation

e. Median Exam A: 50; exam B: 50

f. Interquartile rangeExam A: 100; exam B: 0

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 46

The graphs of normal distributions are the bell-shaped curves called normal curves.

Normal Distributions

A normal curve is a smooth, bell-shaped curve that depicts frequency values distributed symmetrically about the mean.

The mean, median, and mode all have the same value.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 47

The curve extends infinitely in both directions and gets closer and closer to the x-axis but never reaches it.

The curve is symmetrical about its center point, but not all symmetrical distributions are normal.

Normal Distributions

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 48

Normal Distributions

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 49

On a normal curve, about 68% of the values lie within 1 standard deviation of the mean, about 95% lie within 2 standard deviations, and about 99.8% are within 3 standard deviations.

Normal Distributions

The percentages represent approximations of the total percent of area under the curve.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 50

Example 10-10

When a standardized test was scored, there was a mean of 500 and a standard deviation of 100. Suppose that 10,000 students took the test and their scores had a bell-shaped distribution, making it possible to use a normal curve to approximate the distribution.

a. How many scored between 400 and 600?

Since 1 standard deviation on either side of the mean is from 400 to 600, about 68% of the scores fall in this interval. Thus, 0.68(10,000), or 6800, students scored between 400 and 600.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 51

Example 10-10 (continued)

b. How many scored between 300 and 700?

About 95% of 10,000, or 9500, students scored between 300 and 700.

c. How many scored between 200 and 800?

About 99.8% of 10,000, or 9980, students scored between 200 and 800.

d. How many scored above 800?

About 0.1% of 10,000, or 10, students scored above 800.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 52

Application of the Normal Curve

Suppose that a group of 200 students asked their teacher to grade “on a curve.”

If the mean on the test was 71, with a standard deviation of 7, the graph shows how the grades could be assigned.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 53

Application of the Normal Curve

Based on the normal curve, the table shows the range of grades that the teacher might assign if the grades are rounded.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 54

Percentiles

When students take a standardized test such as the ACT or SAT, their scores are often reported in percentiles.

A percentile shows a person’s score relative to other scores.

Percentiles divide the set of data into 100 equal parts. Deciles are points that divide a distribution into 10 equally spaced sections.

The rth percentile is represented by Pr.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 55

A standardized test that was distributed along a normal curve had a median (and mean) of 500 and a standard deviation of 100. The 16th percentile, is 400 because 400 is 1 standard deviation below the median. Find P50 and P84.

Example 10-11

Since 500 is the median, 50% of the distribution is less than 500. P50 = 500.

Since 600 is 1 standard deviation above the median, 84% of the distribution is less than 600. P84 = 600.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 56

a. Ossie was ranked 25th in a class of 250. What was his percentile rank?

Example 10-12

There were 250 − 25, or 225, students ranked

below Ossie. Hence, or 90%, of the class

ranked below him.

Therefore, Ossie ranked at the 90th percentile.

Copyright © 2010 Pearson Education, Inc. All rights reserved. Slide 10.3- 57

b. In a class of 50, Cathy has a percentile rank of 60. What is her class standing?

Example 10-12 (continued)

Since Cathy’s percentile rank is 60, 60% of the class ranks below her. Since 60% of 50 = 30, 30 students ranked below Cathy.

Therefore, Cathy is 20th in her class.