chapter three mcgraw-hill/irwin © 2006 the mcgraw-hill companies, inc., all rights reserved....

30
Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Describing Data: Numerical Measures Measures

Upload: shawn-underwood

Post on 18-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Chapter

Three

McGraw-Hill/Irwin

© 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.

Describing Data: Numerical Describing Data: Numerical MeasuresMeasures

Page 2: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

In this Chapter (3), we learn to describe data using 2 numerical techniques:

1. Measures of Location 2. Measures of Dispersion

Page 3: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

3- 3

Measure of location

Often, we want to know in a set of collected data:

What is a representative data or the typical value ?

OR

What is the center/average of the distribution ?

Egs. US Family income, Price of a house in LA, Rainfall in Seattle,Batting scores.

Read the inset ‘Statistics in action’ on page 58.

Page 4: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

3 types of ‘averages’.

•Mean

•Median

•Mode

Page 5: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

where µ “mu” is the population meanN is the total number of observations. (Note the ‘Capital’ N)X is a particular raw data value. “sigma” indicates the operation of adding.

Population MeanPopulation Mean is the sum of all the population values divided by the total number of population values:

N

X

3- 5

Mean=Average=Arithmetic Mean (synonyms)

Page 6: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Two terms you should know:

i) Parameter - is a measurable characteristic of a population.

Hence, Population Mean μ is a Parameter.

Ii) Statistic - is a measurable characteristic of a sample.

Hence, Sample Mean x is a Statistic.

Page 7: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Example 1

500,484

000,73...000,56

N

X

Find the mean mileage for the cars.

A ParameterParameter is a measurable characteristic of a population.

The Kiers family owns four cars. The following is the current mileage on each of the four cars.

56,000

23,000

42,000

73,000

3- 7

Page 8: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

n

XX

where n is the total number of values in the sample. ( Note the ‘small’ n )

Sample Mean is the sum of all the sample values divided by the number of sample values:

3- 8

“X bar”(not μ!)

A StatisticStatistic is a measurable characteristic of a sample.

Page 9: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

4.155

77

5

0.15...0.14

n

XX

A sample of five executives received the following bonus last year ($000):

14.0, 14.0, 15.0, 15.0, 17.0, 17.0, 16.0, 16.0, 15.015.0

3- 9

The sample mean here is a Statistic (ie, not a Parameter).

Page 10: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

• Every set of interval-level and ratio-level data has a mean.

• A set of data has a unique mean.

• Sum of deviations of each value from the mean is zero*.

• All values included in computing the mean (a good thing).**

• The mean is affected by unusually large or small outlier data values (a shortcoming).

Properties of the Mean

3- 10

* see next slide** not true of Median or Mode

Page 11: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Example 3

0)54()58()53()( XX

Consider the set of values: 3, 8, and 4. The meanmean is 5. Illustrating the fifth

property. ie, sum of deviations is zero.

3- 11

Page 12: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

For an even set of values, the median will be the arithmetic average of the two middle numbers and is

found around (n+1)/2 of data.

The MedianMedian is the value at the middle location after all the data have been ordered from the smallest to the largest.

3- 12

For an odd set of values, the median will be the middle number and is found at (n+1)/2 of data.

Page 13: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

The ages for a sample of five college students are:21, 25, 19, 20, 22.

•Arrange the data in ascending order•The median is at location (5+1)/219, 20, 21, 22, 25.

3- 13

Question:

Calculate the median if the age of the 5th student is 60 years (and not 25).

Page 14: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Arrange the data in ascending order :

73, 75, 76, 80

The median is around (4+1)/2 = 2.5th location

Take the mean of 2nd & 3rd observation

Thus the median is 75.5

The heights of four basketball players, in inches, are: 76, 73, 80, 75.

Page 15: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Unique to each data set Not affected by extremely large or small

values (avoids influence of outlier values)

Can be computed for ordinal, interval and ratio level data

Properties of the Median

3- 15

Eg. A good measure for Housing Prices

Page 16: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Examples:Examples: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most often, it is the mode.

Mode can help you with making decisions!

Prof. Beatle gives out more “B”s than any other grade.

If you have excess production capacity, you may make more of the product that sells most.

The ModeMode is another measure of location and represents the value of the observation that appears most frequently.

3- 16

Page 17: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Properties of the Mode

• Can be used for all levels of data (nominal, ordinal, interval and ratio).

• Not affected by extreme values

Problems:

• If every data value is unique, there is no mode!• You can have equal number of different values leading to multiple modes eg. Bimodal, trimodal, etc.

Page 18: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Practice

Problem #62

Page 87-88

State how you would do (a) & (b)?

What is the answer to (c)

Page 19: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

DispersionDispersion - spread or variability in the data.

3- 19

If you are told the river ahead has an average depth of 4 feet, would you begin crossing it?

Mean by itself is not reliable if dispersion is high

Useful in comparing two sets of data with same mean value

Page 20: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Measures of dispersion

RangeRange Variance Variance Standard deviationStandard deviation

Page 21: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

The following represents the current year’s Return on Equity of the 25 companies in an investor’s portfolio.

-8.1 3.2 5.9 8.1 12.3-5.1 4.1 6.3 9.2 13.3-3.1 4.6 7.9 9.5 14.0-1.4 4.8 7.9 9.7 15.01.2 5.7 8.0 10.3 22.1

Highest value: 22.1Lowest value: -8.1

Range = Highest value – lowest value= 22.1-(-8.1)= 30.2

3- 21Range

Uses just 2 values!

Not useful if Range is wide

Page 22: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

VarianceVariance::

- average of the squared deviations from the mean.

- larger deviations are given higher weight when squared.

Standard deviationStandard deviation:

- square root of the variance

- brings the variance to the same unit as the data

3- 22

Page 23: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Population VariancePopulation Variance formula:

(X - )2

N =

X is the value of an observation in the population

μ is the arithmetic mean of the population

N is the number of observations in the population

3- 23

Page 24: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Example – Page 78

Fill me in

σ is called the Population Standard Deviation (has the same unit of measure as the original data)

Page 25: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Sample variance (sSample variance (s22))

s 2 =(X - X ) 2

n -1

Sample standard deviation (s)Sample standard deviation (s)

2ss

3- 25

(n-1) & NOT n

Guaranteed question in any stat test! Watch out if data is a Population (σ) or Sample (s).

Page 26: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

The hourly wages earned by a sample of five students are:

$7, $5, $11, $8, $6.

Find the sample variance and standard deviation.

40.75

37

n

XX

30.515

2.2115

4.76...4.77

1

2222

n

XXs

30.230.52 ss

3- 26

Practice Time!

Page 27: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Chebyshev’s theoremChebyshev’s theorem:: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at

least:

where k is any constant greater than 1.

2

11

k

3- 27

Say, you conduct a study of heights of all students in class. You compute mean and standard deviation. Now you decided to compute,

how many students are within mean ± 1 s.d.

how many students are within mean ± 2 s.d.

how many students are within mean ± 3 s.d., … etc.

Page 28: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Practice!

Page 84

Problems: 49, 50

Page 29: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

Empirical RuleEmpirical Rule: For any symmetrical, bell-shaped distribution:

About 68% of the observations will lie within 1 s.d. the mean

About 95% of the observations will lie within 2 s.d. of the mean

Virtually all the observations will be within 3 s.d. of the mean

3- 29

Page 30: Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures

See Page 496: Z Column – Values 1, 2, 3

Empirical Rule