bs-2

19
1/27/2015 1 Lecture 2: Methods for Describing data through Numerical Measures North South University School of Business Slide 1 of 76 Outline Measures of central tendency and dispersion Characteristics, uses, advantages, and disadvantages of each measure of location and dispersion Chebyshev’s theorem and the Empirical Rule as they relate to a set of observations North South University School of Business Slide 2 of 76 Quartiles, deciles, and percentiles Box plots Coefficient of skewness and coefficient of variation Scatter diagram Contingency table Numerical Ways of Describing Data Measures of location North South University School of Business Slide 3 of 76 Measures of location Measures of dispersion Parameter and Statistic A Parameter Parameter is a measurable characteristic of a population A statistic statistic is a measurable characteristic of a North South University School of Business Slide 4 of 76 A statistic statistic is a measurable characteristic of a sample

Upload: imtiazbulbul

Post on 19-Dec-2015

221 views

Category:

Documents


2 download

DESCRIPTION

bs-3

TRANSCRIPT

Page 1: BS-2

1/27/2015

1

Lecture 2: Methods for Describing data through

Numerical Measures

North South University School of BusinessSlide 1 of 76

Outline• Measures of central tendency and dispersion

• Characteristics, uses, advantages, and disadvantages ofeach measure of location and dispersion

• Chebyshev’s theorem and the Empirical Rule as theyrelate to a set of observations

North South University School of BusinessSlide 2 of 76

• Quartiles, deciles, and percentiles

• Box plots

• Coefficient of skewness and coefficient of variation

• Scatter diagram

• Contingency table

Numerical Ways of Describing Data

• Measures of location

North South University School of BusinessSlide 3 of 76

Measures of location

• Measures of dispersion

Parameter and Statistic

A ParameterParameter is a measurable characteristic of a population

A statisticstatistic is a measurable characteristic of a

North South University School of BusinessSlide 4 of 76

A statisticstatistic is a measurable characteristic of a sample

Page 2: BS-2

1/27/2015

2

Measures of Location and Dispersion

• Measures for Population Data

• Measures for Sample Data

North South University School of BusinessSlide 5 of 76

• Measures for Ungrouped data

• Measures for grouped data

Measures of Location

Mean (Arithmetic, Weighted, Geometric) Median Mode

North South University School of BusinessSlide 6 of 76

Arithmetic Mean

The The Arithmetic MeanArithmetic Meanis the most widely used is the most widely used measure of location and measure of location and

shows the central value of shows the central value of the datathe data

Average Joe

North South University School of BusinessSlide 7 of 76

the datathe data

It is calculated by summing the values and

dividing by the number of values

Population Mean

N

X

For ungrouped data, the For ungrouped data, the

Population MeanPopulation Mean is is the sum of all the the sum of all the population values population values

divided by the total divided by the total number of populationnumber of population

North South University School of BusinessSlide 8 of 76

where µ is the population mean N is the total number of observations. X is a particular value. indicates the operation of adding.

number of population number of population values:values:

Page 3: BS-2

1/27/2015

3

Example 1

The Kiers family owns four cars. The following is

the current mileage on

each of the four

56,000

42,000

North South University School of BusinessSlide 9 of 76

500,484

000,73...000,56

N

X

Find the mean mileage for the cars.

cars. 23,000

73,000

Sample Mean

For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample

values:

North South University School of BusinessSlide 10 of 76

n

XX

where n is the total number of values in the sample.

Example 2

A sample of five

executives received the

14.0, 15.0, 17.0, 16 0

North South University School of BusinessSlide 11 of 76

4.155

77

5

0.15...0.14

n

XX

following bonus last

year ($000):

16.0, 15.0

Properties of the Arithmetic Mean

Every set of interval-level and ratio-level data has amean.

All the values are included in computing the mean.

A set of data has a unique mean.

North South University School of BusinessSlide 12 of 76

The mean is affected by unusually large or small datavalues.

The arithmetic mean is the only measure of locationwhere the sum of the deviations of each value from themean is zero.

Page 4: BS-2

1/27/2015

4

Example 3

Consider the set of values: 3, 8, and 4. The meanmean is 5. Illustrating the fifth

property

North South University School of BusinessSlide 13 of 76

0)54()58()53()( XX

property

Weighted Mean

The Weighted MeanWeighted Mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2,

...,wn, is computed from the

North South University School of BusinessSlide 14 of 76

)21

)2211

...(

...(

n

nnw

www

XwXwXwX

n pfollowing formula:

Example 4

During a one hour period on a hot Saturday afternoon cabana boy

Chris served fifty drinks. He sold five drinks for $0.50, fifteen for

$0.75, fifteen for $0.90, and fifteen for $1 10 Compute the weighted

North South University School of BusinessSlide 15 of 76

89.0$50

50.44$1515155

)15.1($15)90.0($15)75.0($15)50.0($5

wX

for $1.10. Compute the weighted mean of the price of the drinks.

The Median

There are as many values above the

median as below it in

The MedianMedian is the midpoint of the values after they have been

ordered from the smallest

North South University School of BusinessSlide 16 of 76

the data array.

For an even set of values, the median will be the arithmetic average of the two middle numbers and is

found at the (n+1)/2 ranked observation.

to the largest.

Page 5: BS-2

1/27/2015

5

The ages for a sample of five college students are:

21, 25, 19, 20, 22.

Arranging the data in ascending order

The median (cont’d)

North South University School of BusinessSlide 17 of 76

ggives:

19, 20, 21, 22, 25.

Thus the median is 21.

Example 5

Arranging the data in ascending order

gives:

The heights of four basketball players, in inches, are: 76, 73, 80, 75.

North South University School of BusinessSlide 18 of 76

gives:

73, 75, 76, 80

Thus the median is 75.5.

The median is found at the

(n+1)/2 = (4+1)/2 =2.5th data point.

Properties of the Median

There is a unique median for each data set.

It is not affected by extremely large or smallvalues and is therefore a valuable measure oflocation when such values occur

North South University School of BusinessSlide 19 of 76

location when such values occur.

It can be computed for ratio-level, interval-level, and ordinal-level data.

The Mode

The ModeMode is another measure of location and represents the value of the observation that

appears most frequently.

North South University School of BusinessSlide 20 of 76

Page 6: BS-2

1/27/2015

6

Example 6

The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of

81 occurs the most often, it is the mode.

North South University School of BusinessSlide 21 of 76

Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes,

trimodal, and so on.

Symmetric distribution: A distribution having the same shape on either side of the center

The Relative Positions of the Mean, Median, and Mode

North South University School of BusinessSlide 22 of 76

Skewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution.

Can be positively or negatively skewed, or bimodal

The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution

Zero skewness Mean

= Median

= Mode

North South University School of BusinessSlide 23 of 76

Mode

Median

Mean

The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution

• Positively skewed:Mean and median are to the right of the mode.

North South University School of BusinessSlide 24 of 76

Mean > Median > Mode

Mode

Median

Mean

Page 7: BS-2

1/27/2015

7

Negatively Skewed: Mean and Median are to the left of the Mode

The Relative Positions of the Mean, Median, and Mode: Left Skewed Distribution

North South University School of BusinessSlide 25 of 76

Mean < Median < Mode

ModeMean

Median

Geometric Mean

The Geometric Mean(GM) of a set of n positive

numbers is defined as the nthroot of the product of the nnumbers. The formula is:

North South University School of BusinessSlide 26 of 76

GM X X X X nn ( )( )( )... ( )1 2 3

The geometric mean is used to average percents,

indexes, and relatives.

Example 7

The interest rate on three bonds were 5, 21, and 4 percent.

The arithmetic mean is (5+21+4)/3 =10.0.

The geometric mean is

North South University School of BusinessSlide 27 of 76

49.7)4)(21)(5(3 GM

The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 21percent.

Example 8

• The return on investment earned by Atkins ConstructionCompany for four successive years was: 30%, 20%, -40%, and 200%. What is the geometric mean rate ofreturn on investment?

North South University School of BusinessSlide 28 of 76

294.10.36.02.13.1... 421 n

nXXXGM

The average rate of return is 29.4%

Page 8: BS-2

1/27/2015

8

Geometric Mean (cont’d)

Another use of the geometric mean is to determine the

percent increase in sales, production or other business or

Grow th in Sales 1999-2004

10

20

30

40

50

ales

in M

illion

s($)

North South University School of BusinessSlide 29 of 76

1period) of beginningat (Value

period) of endat Value( nGM

other business or economic series

from one time period to another.

0

10

1999 2000 2001 2002 2003 2004

Year

Sa

Example 9

The total number of females enrolled in American colleges increased from 755,000 in

1992 to 835,000 in 2000.

North South University School of BusinessSlide 30 of 76

0127.1000,755

000,8358 GM

The value 0.0127 indicates that the average annual growth over the last 8-year period was 1.27%.

Dispersionrefers to the

spread or variability in

the data.

Measures of Dispersion

0

5

10

15

20

25

30

0 2 4 6 8 10 12

North South University School of BusinessSlide 31 of 76

the data.

Measures of dispersion include the following: Measures of dispersion include the following:

range, mean deviation, variance, and range, mean deviation, variance, and standard deviationstandard deviation..

Range = Largest value – Smallest value

0 2 4 6 8 10 12

The following represents the current year’s Return on Equity of the 25 companies in an investor’s portfolio.

-8.1 3.2 5.9 8.1 12.3-5.1 4.1 6.3 9.2 13.3-3.1 4.6 7.9 9.5 14.01 4 4 8 7 9 9 7 15 0

Example 10

North South University School of BusinessSlide 32 of 76

-1.4 4.8 7.9 9.7 15.01.2 5.7 8.0 10.3 22.1

Highest value: 22.1 Lowest value: -8.1

Range = Highest value – lowest value= 22.1-(-8.1)

= 30.2

Page 9: BS-2

1/27/2015

9

MAD:The arithmetic

mean of the absolute l f th

The main features :

All values are used in the calculation.

It is not unduly influenced by large or small values

Mean Absolute Deviation (MAD)

North South University School of BusinessSlide 33 of 76

values of the deviations from the

arithmetic mean.

by large or small values.

The absolute values are difficult to manipulate.

n

XXMAD

The weights of a sample of crates containing books for the bookstore (in pounds ) are:

103, 97, 101, 106, 103Find the mean deviation.

X = 102

Example 11

North South University School of BusinessSlide 34 of 76

The mean deviation is:

4.25

541515

102103...102103

n

XXMD

Variance:the arithmetic mean of the

squared

Variance and standard Deviation

North South University School of BusinessSlide 35 of 76

squared deviations from

the mean.

Standard deviation: The square root of the variance.

Not influenced by extreme values.

The units are awkward, the square of the

The major characteristics:The major characteristics:

Population Variance

North South University School of BusinessSlide 36 of 76

, qoriginal units.

All values are used in the calculation.

Page 10: BS-2

1/27/2015

10

Population VariancePopulation Variance formula:

X is the value of an observation in the population

Variance and standard deviation

N

X

2

2

North South University School of BusinessSlide 37 of 76

X is the value of an observation in the population

µ is the arithmetic mean of the population

N is the number of observations in the population

Population Standard Deviation formula:

2

In Example 10, the variance and standard deviation are:

(X - )2

N =

Example 10 (revisited)

62.6

North South University School of BusinessSlide 38 of 76

(-8.1-6.62)2 + (-5.1-6.62)2 + ... + (22.1-6.62)2

25

= 42.227

= 6.498

Sample variance (s2):

s2 =(X - X)2

1

Sample variance and standard deviation

North South University School of BusinessSlide 39 of 76

s2 = n-1

Sample standard deviation (s):

2ss

37X

Example 12

The hourly wages earned by a sample of five students are:

$7, $5, $11, $8, $6.

Find the sample variance and standard deviation.

North South University School of BusinessSlide 40 of 76

40.75

37

n

XX

30.5

15

2.21

15

4.76...4.77

1

2222

n

XXs

30.230.52 ss

Page 11: BS-2

1/27/2015

11

Chebyshev’s theorem: For any set of observations (sample or population), the proportion of the values that lie within k standard deviations of

the mean is at least:

1

Chebyshev’s theorem

North South University School of BusinessSlide 41 of 76

where k is any constant greater than 1.

2

11

k

Chebyshev’s theorem (cont’d)

The arithmetic mean biweekly amount by theDupree Paint employees to the company’s profit-sharing plan was $51.54, and the standarddeviation is $7.51. At least what percent of thecontributions lie within plus 3.5 standard deviationsand minus 3 5 standard deviations of the mean?

North South University School of BusinessSlide 42 of 76

and minus 3.5 standard deviations of the mean?

92.0

25.12

11

5.3

11

11 22

k

About 92%.

Empirical RuleEmpirical Rule: For any symmetrical, bell-shaped distribution:

About 68% of the observations will lie within ±1s f th

Interpretation and Uses of theStandard Deviation

North South University School of BusinessSlide 43 of 76

of the mean

About 95% of the observations will lie within ± 2s of the mean

Virtually all (99.7%) the observations will be within ± 3s of the mean

68%

Interpretation and Uses of the Standard Deviation

Bell-shaped Curve showing the relationship between µand σ

North South University School of BusinessSlide 44 of 76

68%

95%99.7%

Page 12: BS-2

1/27/2015

12

The Mean of Grouped Data

The Mean of a sample of data organized in a frequency

distribution is computed by the following formula:

North South University School of BusinessSlide 45 of 76

n

MfX

Example 13A sample of ten movie theaters

in a large metropolitan

area tallied the total number of movies showing

Movies showing

frequency f

class midpoint M

(f)(M)

1 up to 3 1 2 2

3 up to 5 2 4 8

5 up to 7 3 6 18

North South University School of BusinessSlide 46 of 76

movies showing last week.

Compute the mean number of

movies showing.

5 up to 7 3 6 18

7 up to 9 1 8 8

9 up to 11 3 10 30

Total 10 66

6.610

66

n

MfX

The Median of Grouped Data

2CF

n

The Median of a sample of data organized in a frequency distribution is computed by:

North South University School of BusinessSlide 47 of 76

)(2 if

LMedian

where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class

interval.

Finding the Median Class

• Construct a cumulative frequency distribution.

• Decide the class that contains the median. MedianClass is the first class with the value of cumulativefrequency at least n/2

North South University School of BusinessSlide 48 of 76

frequency at least n/2.

Page 13: BS-2

1/27/2015

13

Example 13 (revisited)

Movies showing

Frequency Cumulative Frequency

1 up to 3 1 1

3 up to 5 2 3

North South University School of BusinessSlide 49 of 76

5 up to 7 3 6

7 up to 9 1 7

9 up to 11 3 10

Example 13 (cont’d)

From the table, L= 5, n =10, f = 3, i = 2, CF = 3

North South University School of BusinessSlide 50 of 76

33.6)2(3

32

10

5)(2

if

CFn

LMedian

The Mode of Grouped Data

The Mode for grouped data is approximated by the midpoint of the class

with the largest class frequency.

Movies h i

frequency f

class id i t

North South University School of BusinessSlide 51 of 76

The modes in example 13 are 6 and 10 and so is

bimodal.

showing f midpoint M

1 up to 3 1 2

3 up to 5 2 4

5 up to 7 3 6

7 up to 9 1 8

9 up to 11 3 10

The Standard Deviation of Grouped Data

The Standard Deviation of a sample of data organized in

a frequency distribution is computed by the following

North South University School of BusinessSlide 52 of 76

1

2

n

XMfs

formula:

Page 14: BS-2

1/27/2015

14

Example 13 (revisited)

A sample of ten movie theaters in a large metropolitan

area tallied the total number of movies showing last week.

Compute the standard deviation of

Movies showing

frequency f class midpoint M

(M-X) f*(M-X)2

1 up to 3 1 2 -4.6 21.16

3 up to 5 2 4 -2.6 13.52

5 up to 7 3 6 -0.6 1.08

7 up to 9 1 8 1.4 1.96

North South University School of BusinessSlide 53 of 76

standard deviation of movies showing.

p

9 up to 11 3 10 3.4 34.68

Total 10 72.40

8363.2

110

40.72

1

2

n

XMfs

Other Measures of Dispersion

• Quartiles divide a set of observations into four equalparts

• Deciles divide a set of observations into 10 equalparts

North South University School of BusinessSlide 54 of 76

• Percentiles divide a set of observations into 100equal parts

Quartiles

Locate the median,

(50th percentile)

first quartile (25th percentile)

and the 3rd quartile

North South University School of BusinessSlide 55 of 76

and the 3rd quartile

(75th percentile)

Location of a Percentile

P

100

where

Lp = (n+1)

North South University School of BusinessSlide 56 of 76

P is the desired percentile

Page 15: BS-2

1/27/2015

15

80

90

100

Stock prices on twelveconsecutive days for a

majorpublicly traded company

Example 14

North South University School of BusinessSlide 57 of 76

50

60

70

1 2 3 4 5 6 7 8 9 10 11 12

86, 79, 92, 84, 69, 88, 91

83, 96, 78, 82, 85.

Using the twelve stock prices, we can find the median, 25th, and 75th percentiles as follows:

L75 = (12 + 1) 75100

= 9.75th observationQuartile 3

Example 14 (cont’d)

North South University School of BusinessSlide 58 of 76

L50 = (12 + 1) 50100 = 6.50th observation

L25 = (12+1) 25100

= 3.25th observationQuartile 1

Median

9692918886

12111098 50th percentile: Median

75th percentilePrice at 9.75 observation = 88 + .75(91-88)

= 90.25

Q3

Q4

Example 14 (cont’d)To locate the values, the first step is to organize the data in increasing order

North South University School of BusinessSlide 59 of 76

8685848382797869

87654321

25th percentilePrice at 3.25 observation = 79 + .25(82-79)

= 79.75

50 percentile: MedianPrice at 6.50 observation = 84 + .5(85-84)

= 84.50

Q1

Q2

Q3

Interquartile Range

The Interquartilerange is the distance

between the third quartile Q3 and the

This distance will include the middle 50 percent of the

North South University School of BusinessSlide 60 of 76

3

first quartile Q1.p

observations.

Interquartile range = Q3 - Q1

Page 16: BS-2

1/27/2015

16

Example 15For a set of

observations the third quartile is 24 and the

first quartile is 10. What is the quartile

deviation?

North South University School of BusinessSlide 61 of 76

deviation?

The interquartile range is 24 - 10 = 14. Fifty

percent of the observations will occur

between 10 and 24.

Box Plots

Five pieces of data are needed

A box plot is a graphical display, based on quartiles, that helps to picture a set of

data.

North South University School of BusinessSlide 62 of 76

to construct a box plot: the Minimum Value, the First Quartile, the Median, the Third Quartile, and the Maximum Value.

Example 16

Based on a sample of 20 deliveries, Buddy’s Pizza determined the following

information. The minimum delivery time was 13 minutes and the maximum 30 minutes. The first quartile was 15

minutes the median 18 minutes and the

North South University School of BusinessSlide 63 of 76

minutes, the median 18 minutes, and the third quartile 22 minutes. Develop a box

plot for the delivery times.

Example 16 (cont’d)

North South University School of BusinessSlide 64 of 76

Page 17: BS-2

1/27/2015

17

Example 16 (cont’d)

Q1 Q3MaxMin Median

North South University School of BusinessSlide 65 of 76

12 14 16 18 20 22 24 26 28 30 32

Coefficient of Variation

The coefficient of variation is the ratio of the standard deviation to the arithmetic

mean expressed as a

Relative dispersion

North South University School of BusinessSlide 66 of 76

%)100(X

sCV

mean, expressed as a percentage:

Mean

Skewness is the measurement of the lack of symmetry of the distribution.

The coefficient of skewness can range from -3 00 up to 3 00

Skewness

North South University School of BusinessSlide 67 of 76

from 3.00 up to 3.00 when using the

following formula:A value of 0 indicates a symmetric distribution.

Some software packages use a different formula which results in a

wider range for the coefficient.

s

MedianXsk

3

Using the twelve stock prices, we find the mean to be 84.42, standard deviation, 7.18, median, 84.5.

Coefficient of variation:

Example 14 revisited

86 79 92 84 69 88 91 83 96 78 82 85

North South University School of BusinessSlide 68 of 76

= 8.5%%)100(X

sCV

Coefficient of skewness:

= -.035

s

MedianXsk

3

Page 18: BS-2

1/27/2015

18

Relationship Between Two Variables

• Univariate Data (Single Variable)

• Bivariate Data (Two Variables)– Scatter diagram

North South University School of BusinessSlide 69 of 76

– Contingency table

Scatter diagram :

A technique used to

show the

Variables must be at least interval scaled

Scatter diagram

North South University School of BusinessSlide 70 of 76

show the relationship

between variables.

Relationship can be positive (direct) or negative (inverse)

96929188

PriceIndex(000s)

8.07.57.57.3

Relationship between Market Index and Stock Price

100

Example 14 revisitedThe twelve days of stock prices and the overall market index on each day

are given as follows:

North South University School of BusinessSlide 71 of 76

8685848382797869

7.27.27.17.17.06.26.25.1

50

60

70

80

90

5 6 7 8 9 10

Index

Pri

ce

A contingency table is used to classify observations

according to two identifiable characteristics.

Contingency tables are used

Contingency table

North South University School of BusinessSlide 72 of 76

A contingency table is a cross tabulation that

simultaneously summarizes two variables of interest.

g ywhen one or both variables are

nominally scaled.

Page 19: BS-2

1/27/2015

19

Weight Loss45 adults, all 60 pounds

overweight, are randomly assigned to three weight loss programs. Twenty weeks into

the program a researcher

Example 17

North South University School of BusinessSlide 73 of 76

the program, a researcher gathers data on weight loss

and divides the loss into three categories: less than 20

pounds, 20 up to 40 pounds, 40 or more pounds. Here are

the results.

Weight

Loss

Plan

Less than 20 pounds

20 up to 40

pounds

40 pounds or more

Plan 1 4 8 3

Example 17 (cont’d)

North South University School of BusinessSlide 74 of 76

4 8 3

Plan 2 2 12 1

Plan 3 12 2 1

Compare the weight loss under the three plans.

Practice Problems• Problem 11 (Page 62)

(Problem 13)

• Problem 21 (Page 68)

(Problem 25 (Page 69))

• Problem 27 (Page 70)

(Problem 31 (Page 71))

North South University School of BusinessSlide 75 of 76

(Problem 31 (Page 71))

• Problem 42 (Page 76)

(Problem 46 (Page 79))

• Problem 47 (Page 79)

(Problem 51 (Page 82))

• Problem 49 (Page 81)

(Problem 53 (Page 84))

Assignment-2

• Problem 55 (Page 84)

(Problem 59 (Page 88))

• Problems 11, 13 (Page 108)

(Problems 11, 13 (Page 110))

• Problem 15 (Page 111)

North South University School of BusinessSlide 76 of 76

• Problem 15 (Page 111)

(Problem 15 (Page 113))

• Problem 20 (Page 113)

• Problem 25 (Page 117)

(Problem 21 (Page 118))