descriptive statistics (part 1) chapter44 numerical description central tendency dispersion...

63
Descriptive Statistics Descriptive Statistics (Part 1) (Part 1) C h a p t e r 4 4 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.

Upload: winifred-blankenship

Post on 30-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

Descriptive Statistics Descriptive Statistics (Part 1)(Part 1)

Chapter4444

Numerical Description

Central Tendency

Dispersion

McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.

Page 2: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• StatisticsStatistics are descriptive measures derived from a sample (n items).

• ParametersParameters are descriptive measures derived from a population (N items).

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

4A-2

Page 3: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Three key characteristics of numerical data:Three key characteristics of numerical data:

CharacteristicCharacteristic InterpretationInterpretation

Central TendencyCentral Tendency Where are the data values concentrated? What seem to be typical or middle data values?

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

DispersionDispersion How much variation is there in the data? How spread out are the data values? Are there unusual values?

ShapeShape Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal?

4A-3

Page 4: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Numerical statistics can be used to summarize Numerical statistics can be used to summarize this random sample of brands.this random sample of brands.

• Defect rate = Defect rate = total no. defectstotal no. defectsno. inspectedno. inspected

x 100x 100

• Must allow for sampling error since the Must allow for sampling error since the analysis is based on sampling.analysis is based on sampling.

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

ExampleExample: Vehicle Quality: Vehicle Quality

• Consider the data set of vehicle defect rates Consider the data set of vehicle defect rates from J. D. Power and Associates. from J. D. Power and Associates.

4A-4

Page 5: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

• Number of defects per 100 vehicles, 2006 models.Number of defects per 100 vehicles, 2006 models.

4A-5

Page 6: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

To begin, sort the data in Excel.

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

4A-6

Page 7: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Sorted data provides insight into central Sorted data provides insight into central tendency and dispersion.tendency and dispersion.

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

4A-7

Page 8: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The dot plot offers a visual impression of the The dot plot offers a visual impression of the data.data.

Visual DisplaysVisual Displays

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

4A-8

Page 9: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Histograms with 5 bins (suggested by Sturge’s Histograms with 5 bins (suggested by Sturge’s Rule) and 10 bins are shown below.Rule) and 10 bins are shown below.

• Both are symmetric with no extreme values Both are symmetric with no extreme values and show a modal class toward the low end.and show a modal class toward the low end.

Visual DisplaysVisual Displays

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

4A-9

Page 10: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The central tendency is the middle or typical The central tendency is the middle or typical values of a distribution.values of a distribution.

• Central tendency can be assessed using a dot Central tendency can be assessed using a dot plot, histogram or more precisely with plot, histogram or more precisely with numerical statistics.numerical statistics.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

4A-10

Page 11: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

StatisticStatistic FormulaFormula Excel FormulaExcel Formula ProPro ConCon

MeanMean=AVERAGE(Data

)

Familiar and uses all the sample information.

Influenced by extreme values.1

1 n

ii

xn

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Six Measures of Central TendencySix Measures of Central Tendency

MedianMedian

Middle Middle value in value in sorted sorted arrayarray

=MEDIAN(Data)Robust when extreme data values exist.

Ignores extremes and can be affected by gaps in data values.

4A-11

Page 12: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

StatisticStatistic FormulaFormula Excel FormulaExcel Formula ProPro ConCon

ModeMode

Most frequently occurring data value

=MODE(Data)

Useful for attribute data or discrete data with a small range.

May not be unique, and is not helpful for continuous data.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Six Measures of Central TendencySix Measures of Central Tendency

MidrangeMidrange=0.5*(MIN(Data)

+MAX(Data))

Easy to Easy to understand understand and and calculate.calculate.

Influenced Influenced by extreme by extreme values and values and ignores ignores most data most data values.values.

min max

2

x x

4A-12

Page 13: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

StatisticStatistic FormulaFormula Excel FormulaExcel Formula ProPro ConCon

Geometric Geometric mean (mean (GG))

=GEOMEAN(Data)

Useful for growth rates and mitigates high extremes.

Less familiar and requires positive data.

Trimmed Trimmed meanmean

Same as the mean except omit highest and lowest k% of data values (e.g., 5%)

=TRIMMEAN(Data, Percent)

Mitigates effects of extreme values.

Excludes some data values that could be relevant.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Six Measures of Central TendencySix Measures of Central Tendency

1 2 ... nnx x x

4A-13

Page 14: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• A familiar measure of central tendency.A familiar measure of central tendency.

• In Excel, use function =AVERAGE(Data) where In Excel, use function =AVERAGE(Data) where Data is an array of data values.Data is an array of data values.

Population Formula Sample Formula

1

N

ii

x

N

1

n

ii

xx

n

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MeanMean

4A-14

Page 15: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• For the sample of For the sample of nn = 37 car brands: = 37 car brands:

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MeanMean

4A-15

Page 16: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Arithmetic mean is the most familiar average.Arithmetic mean is the most familiar average.

• Affected by every sample item.Affected by every sample item.

• The balancing point or fulcrum for the data.The balancing point or fulcrum for the data.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Characteristics of the MeanCharacteristics of the Mean

4A-16

Page 17: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Regardless of the shape of the distribution, Regardless of the shape of the distribution, absolute distances from the mean to the data absolute distances from the mean to the data points always sum to zero.points always sum to zero.

1

( ) 0n

ii

x x

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Characteristics of the MeanCharacteristics of the Mean

• Consider the following Consider the following asymmetric distribution of asymmetric distribution of quiz scores whose mean = 65.quiz scores whose mean = 65.

1

( )n

ii

x x

= (42 – 65) + (60 – 65) + (70 – 65) + (75 – 65) + (78 – 65)= (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 04A-17

Page 18: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The The medianmedian ( (MM) is the 50) is the 50thth percentile or percentile or midpoint of the midpoint of the sortedsorted sample data. sample data.

• MM separates the upper and lower half of the separates the upper and lower half of the sorted observations.sorted observations.

• If If nn is odd, the median is the middle is odd, the median is the middle observation in the data array.observation in the data array.

• If If nn is even, the median is the average of the is even, the median is the average of the middle two observations in the data array.middle two observations in the data array.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MedianMedian

4A-18

Page 19: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Consider the following Consider the following nn = 6 data values: = 6 data values:11 12 15 17 21 3211 12 15 17 21 32

• What is the median?What is the median?

M = (x3+x4)/2 = (15+17)/2 = 16

11 12 15 1616 17 21 32

For even n, Median = / 2 ( / 2 1)

2n nx x

n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MedianMedian

4A-19

Page 20: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Median Median (Figure 4.6)

• For For nn = 8, the median is between the fourth and fifth = 8, the median is between the fourth and fifth observations in the data array.observations in the data array.

4A-20

Page 21: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MedianMedian

• For For nn = 9, the median is the fifth observation in the = 9, the median is the fifth observation in the data array.data array.

4A-21

Page 22: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Consider the following Consider the following nn = 7 data values: = 7 data values:12 23 23 25 27 34 4112 23 23 25 27 34 41

• What is the median?What is the median?

M = x4 = 25

12 23 23 2525 27 34 41

For odd n, Median = ( 1) / 2nx

(n+1)/2 = (7+1)/2 = 8/2 = 4

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MedianMedian

4A-22

Page 23: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Use Excel’s function =MEDIAN(Data) where Use Excel’s function =MEDIAN(Data) where Data is an array of data values.Data is an array of data values.

• For the 37 vehicle quality ratings (odd For the 37 vehicle quality ratings (odd nn) the ) the position of the median is position of the median is ((nn+1)/2 = (37+1)/2 = 19.+1)/2 = (37+1)/2 = 19.

• So, the median is So, the median is xx1919 = 121. = 121.

• When there are several duplicate data values, When there are several duplicate data values, the median does not provide a clean “50-50” the median does not provide a clean “50-50” split in the data.split in the data.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MedianMedian

4A-23

Page 24: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The median is insensitive to extreme data values.The median is insensitive to extreme data values.

• For example, consider the following quiz scores for For example, consider the following quiz scores for 3 students:3 students:

Tom’s scores:Tom’s scores: 20, 40, 70, 75, 80 20, 40, 70, 75, 80 Mean =57, Mean =57, Median = 70Median = 70, Total = 285, Total = 285Jake’s scores:Jake’s scores: 60, 65, 70, 90, 95 60, 65, 70, 90, 95 Mean = 76, Mean = 76, Median = 70Median = 70, Total = 380, Total = 380Mary’s scores:Mary’s scores: 50, 65, 70, 75, 90 50, 65, 70, 75, 90 Mean = 70, Mean = 70, Median = 70Median = 70, Total = 350, Total = 350

• What does the median for each student tell you?What does the median for each student tell you?

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Characteristics of the MedianCharacteristics of the Median

4A-24

Page 25: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The most frequently occurring data value.The most frequently occurring data value.

• Similar to mean and median if data values Similar to mean and median if data values occur often near the center of sorted data.occur often near the center of sorted data.

• May have multiple modes or no mode. May have multiple modes or no mode.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

ModeMode

4A-25

Page 26: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

Lee’s scores:Lee’s scores: 60, 70, 70, 70, 80 60, 70, 70, 70, 80 Mean =70, Median = 70, Mean =70, Median = 70, Mode = 70Mode = 70Pat’s scores:Pat’s scores: 45, 45, 70, 90, 100 45, 45, 70, 90, 100 Mean = 70, Median = 70, Mean = 70, Median = 70, Mode = 45Mode = 45Sam’s scores:Sam’s scores: 50, 60, 70, 80, 90 50, 60, 70, 80, 90 Mean = 70, Median = 70, Mean = 70, Median = 70, Mode = noneMode = noneXiao’s scores:Xiao’s scores: 50, 50, 70, 90, 90 50, 50, 70, 90, 90 Mean = 70, Median = 70, Mean = 70, Median = 70, Modes = 50,90Modes = 50,90

Central TendencyCentral TendencyCentral TendencyCentral Tendency

ModeMode• For example, consider the following quiz For example, consider the following quiz

scores for 3 students:scores for 3 students:

• What does the mode for each student tell you?What does the mode for each student tell you?4A-26

Page 27: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Easy to define, not easy to calculate in large Easy to define, not easy to calculate in large samples.samples.

• Use Excel’s function =MODE(Array)Use Excel’s function =MODE(Array)- will return #N/A if there is no mode.- will return #N/A if there is no mode.- will return first mode found if multimodal.- will return first mode found if multimodal.

• May be far from the middle of the distribution May be far from the middle of the distribution and not at all typical.and not at all typical.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

ModeMode

4A-27

Page 28: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Generally isn’t useful for continuous data Generally isn’t useful for continuous data since data values rarely repeat.since data values rarely repeat.

• Best for attribute data or a discrete variable Best for attribute data or a discrete variable with a small range (e.g., Likert scale).with a small range (e.g., Likert scale).

Central TendencyCentral TendencyCentral TendencyCentral Tendency

ModeMode

4A-28

Page 29: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Consider the following Consider the following P/EP/E ratios for a random ratios for a random sample of 68 Standard & Poor’s 500 stocks.sample of 68 Standard & Poor’s 500 stocks.

• What is the mode?What is the mode?

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Example: Price/Earnings Ratios and ModeExample: Price/Earnings Ratios and Mode

7 8 8 10 10 10 10 12 13 13 13 13 13 13 13 14 14

14 15 15 15 15 15 16 16 16 17 18 18 18 18 19 19 19

19 19 20 20 20 21 21 21 22 22 23 23 23 24 25 26 26

26 26 27 29 29 30 31 34 36 37 40 41 45 48 55 68 91

4A-29

Page 30: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Excel’s descriptive Excel’s descriptive statistics results are:statistics results are:

• The mode 13 occurs 7 times, but what does the dot plot show?

Mean 22.7206

Median 19

Mode 13

Range 84

Minimum 7

Maximum 91

Sum 1545

Count 68

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Example: Price/Earnings Ratios and ModeExample: Price/Earnings Ratios and Mode

4A-30

Page 31: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The dot plot shows local modes (a peak with The dot plot shows local modes (a peak with valleys on either side) at 10, 13, 15, 19, 23, 26, valleys on either side) at 10, 13, 15, 19, 23, 26, 29.29.

• These multiple modes suggest that the mode These multiple modes suggest that the mode is not a stable measure of central tendency.is not a stable measure of central tendency.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Example: Price/Earnings Ratios and ModeExample: Price/Earnings Ratios and Mode

4A-31

Page 32: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Points scored by the winning NCAA football Points scored by the winning NCAA football team tends to have modes in multiples of 7 team tends to have modes in multiples of 7 because each touchdown yields 7 points.because each touchdown yields 7 points.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Example: Rose Bowl Winners’ PointsExample: Rose Bowl Winners’ Points

• Consider the dot plot of the points scored by Consider the dot plot of the points scored by the winning team in the first 87 Rose Bowl the winning team in the first 87 Rose Bowl games.games.

• What is the mode?What is the mode?4A-32

Page 33: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• A A bimodal distributionbimodal distribution refers to the shape of the refers to the shape of the histogram rather than the mode of the raw data.histogram rather than the mode of the raw data.

• Occurs when dissimilar populations are combined Occurs when dissimilar populations are combined in one sample. For example,in one sample. For example,

Central TendencyCentral TendencyCentral TendencyCentral Tendency

ModeMode

4A-33

Page 34: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Compare mean and median or look at Compare mean and median or look at histogram to determine degree of skew ness.histogram to determine degree of skew ness.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Skew nessSkew ness

4A-34

Page 35: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

Distribution’s Distribution’s ShapeShape

Histogram AppearanceHistogram Appearance StatisticsStatistics

Skewed leftSkewed left(negative (negative skew ness)skew ness)

Long tail of histogram points left(a few low values but most data on right)

Mean < MedianMean < Median

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Symptoms of Skew nessSymptoms of Skew ness

SymmetricSymmetricTails of histogram are balanced (low/high values offset)

Mean Mean MedianMedian

Skewed rightSkewed right(positive (positive skew ness)skew ness)

Long tail of histogram points right(most data on left but a few high values)

Mean > MedianMean > Median

4A-35

Page 36: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• For the sample of spending per customer at 74 For the sample of spending per customer at 74 Noodles &, the mean ($7.04) exceeds the Noodles &, the mean ($7.04) exceeds the median ($7.00). What does this suggest?median ($7.00). What does this suggest?

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Skew nessSkew ness

4A-36

Page 37: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The The geometric meangeometric mean (G) is a (G) is a multiplicative average.multiplicative average.

• For the J. D. Power quality data (n=37):For the J. D. Power quality data (n=37):

1 2 ... nnG x x x

37 7737 (87)(93)(98)...(164)(173) 2.37667 10 123.38G

• In Excel use =GEOMEAN(Array)In Excel use =GEOMEAN(Array)

• The geometric mean tends to mitigate the The geometric mean tends to mitigate the effects of high outliers.effects of high outliers.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Geometric MeanGeometric Mean

4A-37

Page 38: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• A variation on the geometric mean used to A variation on the geometric mean used to find the average find the average growth rategrowth rate for a time series. for a time series.

• For example, from For example, from 2002 to 2006, 2002 to 2006, JetBlue Airlines JetBlue Airlines revenues are:revenues are:

1

1nnx

Gx

Year Revenue (mil)

2002 635

2003 998

2004 1265

2005 1701

2006 2363

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Growth RatesGrowth Rates

4A-38

Page 39: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The average growth rate is given by taking the The average growth rate is given by taking the geometric mean of the ratios of each year’s geometric mean of the ratios of each year’s revenue to the preceding year.revenue to the preceding year.

• Due to cancellations, only the first and last Due to cancellations, only the first and last years are relevant:years are relevant:

= 1.3891 = .389 or 38.9% per year

• In Excel use =(2363/635)^(1/4)-1In Excel use =(2363/635)^(1/4)-1

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Growth RatesGrowth Rates

G

9 9 8

6 3 5

1 2 6 5

9 9 8

1 7 0 1

1 2 6 5

2 3 6 3

1 7 0 11

2 3 6 3

6 3 514 4

4A-39

Page 40: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The The midrangemidrange is the point halfway between the is the point halfway between the lowest and highest values of X.lowest and highest values of X.

• Easy to use but sensitive to extreme data values.Easy to use but sensitive to extreme data values.min max

2

x xMidrange =

• For the J. D. Power quality data (n=37):For the J. D. Power quality data (n=37):

min max

2

x xMidrange = =

• Here, the midrange (147.5) is higher than the mean Here, the midrange (147.5) is higher than the mean (134.51) or median (132).(134.51) or median (132).

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MidrangeMidrange

9 1 2 0 4

21 4 7 5

.

4A-40

Page 41: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• To calculate the To calculate the trimmed meantrimmed mean, first remove the , first remove the highest and lowest highest and lowest kk percent of the observations. percent of the observations.

• For example, for the For example, for the nn = 68 P/E ratios, we want a 5 = 68 P/E ratios, we want a 5 percent trimmed mean (i.e., percent trimmed mean (i.e., kk = .05). = .05).

• To determine how many observations to trim, To determine how many observations to trim, multiply multiply kk x x nn = 0.05 x 68 = 3.4 or 3 observations. = 0.05 x 68 = 3.4 or 3 observations.

• So, we would remove the three smallest and three So, we would remove the three smallest and three largest observations before averaging the remaining largest observations before averaging the remaining values.values.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Trimmed MeanTrimmed Mean

4A-41

Page 42: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Here is a summary of all the measures of Here is a summary of all the measures of central tendency for the central tendency for the nn = 68 P/E values. = 68 P/E values.

• The trimmed mean mitigates the effects of very high The trimmed mean mitigates the effects of very high values, but still exceeds the median.values, but still exceeds the median.

Mean:Mean: 22.72 =AVERAGE(PERatio)

Median:Median: 19.00 =MEDIAN(PERatio)

Mode:Mode: 13.00 =MODE(PERatio)

Geometric Geometric Mean:Mean: 19.85 =GEOMEAN(PERatio)

Midrange:Midrange: 49.00 (MIN(PERatio)+MAX(PERatio))/2

5% Trim Mean:5% Trim Mean: 21.10 =TRIMMEAN(PERatio,0.1)

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Trimmed MeanTrimmed Mean

4A-42

Page 43: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

Central TendencyCentral Tendency

Trimmed MeanTrimmed Mean

• The Federal Reserve uses a 16% trimmed mean to mitigate the effects of extremes in its analysis of the Consumer Price Index.

4A-43

Page 44: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• VariationVariation is the “spread” of data points about is the “spread” of data points about the center of the distribution in a sample. the center of the distribution in a sample. Consider the following measures of dispersion:Consider the following measures of dispersion:

StatisticStatistic FormulaFormula ExcelExcel ProPro ConCon

RangeRange xmax – xmin=MAX(Data)-

MIN(Data)Easy to calculate

Sensitive to extreme data values.

DispersionDispersionDispersionDispersion

Variance Variance (s(s22))

=VAR(Data)Plays a key role in mathematical statistics.

Non-intuitive meaning.

2

1

1

n

ii

x x

n

Measures of VariationMeasures of Variation

4A-44

Page 45: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

StatisticStatistic FormulaFormula ExcelExcel ProPro ConCon

Standard Standard deviation deviation ((ss))

=STDEV(Data)

Most common measure. Uses same units as the raw data ($ , £, ¥, etc.).

Non-intuitive meaning.

2

1

1

n

ii

x x

n

DispersionDispersionDispersionDispersion

Measures of VariationMeasures of Variation

Coef-Coef-ficient. officient. ofvariation variation ((CVCV))

None

Measures relative variation in percent so can compare data sets.

Requires non-negative data.

100s

x

4A-45

Page 46: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

StatisticStatistic FormulaFormula ExcelExcel ProPro ConCon

Mean absolute deviation (MAD)

=AVEDEV(Data)Easy to understand.

Lacks “nice” theoretical properties.

DispersionDispersionDispersionDispersion

Measures of VariationMeasures of Variation

1

n

ii

x x

n

4A-46

Page 47: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The difference between the largest and The difference between the largest and smallest observation.smallest observation.

Range = xmax – xmin

• For example, for the For example, for the nn = 68 P/E ratios, = 68 P/E ratios,

Range = 91 – 7 = 84

DispersionDispersionDispersionDispersion

RangeRange

4A-47

Page 48: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The population variancepopulation variance (2) is defined as the sum of squared deviations around the mean divided by the population size.

• For the sample variancesample variance (s2), we divide by n – 1 instead of n, otherwise s2 would tend to underestimate the unknown population variance 2.

2

2 1

N

ii

x

N

2

2 1

1

n

ii

x xs

n

DispersionDispersionDispersionDispersion

VarianceVariance

4A-48

Page 49: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The square root of the variance.The square root of the variance.

• Units of measure are the same as Units of measure are the same as XX..

Population Population standard standard deviationdeviation

2

1

N

ii

x

N

Sample Sample

standard standard deviationdeviation

2

1

1

n

ii

x xs

n

• Explains how individual values in a data set Explains how individual values in a data set vary from the mean.vary from the mean.

DispersionDispersionDispersionDispersion

Standard DeviationStandard Deviation

4A-49

Page 50: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Excel’s built in functions areExcel’s built in functions are

StatisticStatistic Excel Excel populationpopulation formulaformula

Excel Excel sample sample formulaformula

VarianceVariance =VARP(Array) =VAR(Array)

Standard deviationStandard deviation =STDEVP(Array) =STDEV(Array)

DispersionDispersionDispersionDispersion

Standard DeviationStandard Deviation

4A-50

Page 51: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Consider the following five quiz scores for Consider the following five quiz scores for Stephanie. Stephanie. (Table 4.12)

DispersionDispersionDispersionDispersion

Calculating a Standard DeviationCalculating a Standard Deviation

4A-51

Page 52: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Now, calculate the sample standard deviation:Now, calculate the sample standard deviation:

2

1 2380595 24.39

1 5 1

n

ii

x xs

n

• Somewhat easier, the Somewhat easier, the two-sum formulatwo-sum formula can can also be used:also be used:

2

212

2 1

(360)28300 28300 259205 595 24.39

1 5 1 5 1

n

ini

ii

x

xns

n

DispersionDispersionDispersionDispersion

Calculating a Standard DeviationCalculating a Standard Deviation

4A-52

Page 53: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The standard deviation is nonnegative because The standard deviation is nonnegative because deviations around the mean are squared.deviations around the mean are squared.

• When every observation is exactly equal to the When every observation is exactly equal to the mean, the standard deviation is zero.mean, the standard deviation is zero.

• Standard deviations can be large or small, Standard deviations can be large or small, depending on the units of measure.depending on the units of measure.

• Compare standard deviations Compare standard deviations onlyonly for data for data sets measured in the same units and only if sets measured in the same units and only if the means do not differ substantially.the means do not differ substantially.

DispersionDispersionDispersionDispersion

Calculating a Standard DeviationCalculating a Standard Deviation

4A-53

Page 54: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Useful for comparing variables measured in Useful for comparing variables measured in different units or with different means.different units or with different means.

• A unit-free measure of dispersionA unit-free measure of dispersion

• Expressed as a percent of the mean.Expressed as a percent of the mean.

• Only appropriate for nonnegative data. It is Only appropriate for nonnegative data. It is undefined if the mean is zero or negative.undefined if the mean is zero or negative.

100s

CVx

DispersionDispersionDispersionDispersion

Coefficient of VariationCoefficient of Variation

4A-54

Page 55: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• For example:For example:

Defect rates Defect rates

((nn = 37) = 37)

s = 22.89= 125.38 gives CV = 100 × (22.89)/(125.38) = 18%

ATM ATM deposits deposits ((nn = 100) = 100)

s = 280.80= 233.89 gives CV = 100 × (280.80)/(233.89) =

120%

P/E ratios P/E ratios ((nn = 68) = 68)

s = 14.28= 22.72 gives CV = 100 × (14.08)/(22.72) = 62%

x

x

x

100s

CVx

DispersionDispersionDispersionDispersion

Coefficient of VariationCoefficient of Variation

4A-55

Page 56: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• The Mean Absolute DeviationMean Absolute Deviation (MAD) reveals the average distance from an individual data point to the mean (center of the distribution).

• Uses absolute values of the deviations around the mean.

• Excel’s function is =AVEDEV(Array)

1

n

ii

x xMAD

n

DispersionDispersionDispersionDispersion

Mean Absolute DeviationMean Absolute Deviation

4A-56

Page 57: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Consider the histograms of hole diameters Consider the histograms of hole diameters drilled in a steel plate during manufacturing.drilled in a steel plate during manufacturing.

• The desired distribution is outlined in red.The desired distribution is outlined in red.

DispersionDispersionDispersionDispersion

Machine AMachine A Machine BMachine B

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Manufacturing Manufacturing

4A-57

Page 58: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

Desired mean (5mm) but too much variation.

Acceptable variation but mean is less than 5 mm.

• Take frequent samples to monitor quality.Take frequent samples to monitor quality.

Machine AMachine A Machine BMachine B

DispersionDispersionDispersionDispersion

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Manufacturing Manufacturing

4A-58

Page 59: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Consider student ratings of four professors on Consider student ratings of four professors on eight teaching attributes (10-point scale).eight teaching attributes (10-point scale).

DispersionDispersionDispersionDispersion

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Job Performance Job Performance

4A-59

Page 60: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Jones and Wu have identical means but Jones and Wu have identical means but different standard deviations.different standard deviations.

DispersionDispersionDispersionDispersion

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Job Performance Job Performance

4A-60

Page 61: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• Smith and Gopal have different means but Smith and Gopal have different means but identical standard deviations.identical standard deviations.

DispersionDispersionDispersionDispersion

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Job Performance Job Performance

4A-61

Page 62: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

• A high mean (better rating) and low standard A high mean (better rating) and low standard deviation (more consistency) is preferred. deviation (more consistency) is preferred. Which professor do you think is best?Which professor do you think is best?

DispersionDispersionDispersionDispersion

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Job Performance Job Performance

4A-62

Page 63: Descriptive Statistics (Part 1) Chapter44 Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies,

Applied Statistics in Applied Statistics in Business & EconomicsBusiness & Economics

End of Chapter 4AEnd of Chapter 4A

4A-63