descriptive statistics (part 1) chapter44 numerical description central tendency dispersion...

Post on 30-Dec-2015

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Descriptive Statistics Descriptive Statistics (Part 1)(Part 1)

Chapter4444

Numerical Description

Central Tendency

Dispersion

McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.

• StatisticsStatistics are descriptive measures derived from a sample (n items).

• ParametersParameters are descriptive measures derived from a population (N items).

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

4A-2

• Three key characteristics of numerical data:Three key characteristics of numerical data:

CharacteristicCharacteristic InterpretationInterpretation

Central TendencyCentral Tendency Where are the data values concentrated? What seem to be typical or middle data values?

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

DispersionDispersion How much variation is there in the data? How spread out are the data values? Are there unusual values?

ShapeShape Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal?

4A-3

• Numerical statistics can be used to summarize Numerical statistics can be used to summarize this random sample of brands.this random sample of brands.

• Defect rate = Defect rate = total no. defectstotal no. defectsno. inspectedno. inspected

x 100x 100

• Must allow for sampling error since the Must allow for sampling error since the analysis is based on sampling.analysis is based on sampling.

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

ExampleExample: Vehicle Quality: Vehicle Quality

• Consider the data set of vehicle defect rates Consider the data set of vehicle defect rates from J. D. Power and Associates. from J. D. Power and Associates.

4A-4

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

• Number of defects per 100 vehicles, 2006 models.Number of defects per 100 vehicles, 2006 models.

4A-5

To begin, sort the data in Excel.

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

4A-6

• Sorted data provides insight into central Sorted data provides insight into central tendency and dispersion.tendency and dispersion.

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

4A-7

• The dot plot offers a visual impression of the The dot plot offers a visual impression of the data.data.

Visual DisplaysVisual Displays

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

4A-8

• Histograms with 5 bins (suggested by Sturge’s Histograms with 5 bins (suggested by Sturge’s Rule) and 10 bins are shown below.Rule) and 10 bins are shown below.

• Both are symmetric with no extreme values Both are symmetric with no extreme values and show a modal class toward the low end.and show a modal class toward the low end.

Visual DisplaysVisual Displays

Numerical DescriptionNumerical DescriptionNumerical DescriptionNumerical Description

4A-9

• The central tendency is the middle or typical The central tendency is the middle or typical values of a distribution.values of a distribution.

• Central tendency can be assessed using a dot Central tendency can be assessed using a dot plot, histogram or more precisely with plot, histogram or more precisely with numerical statistics.numerical statistics.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

4A-10

StatisticStatistic FormulaFormula Excel FormulaExcel Formula ProPro ConCon

MeanMean=AVERAGE(Data

)

Familiar and uses all the sample information.

Influenced by extreme values.1

1 n

ii

xn

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Six Measures of Central TendencySix Measures of Central Tendency

MedianMedian

Middle Middle value in value in sorted sorted arrayarray

=MEDIAN(Data)Robust when extreme data values exist.

Ignores extremes and can be affected by gaps in data values.

4A-11

StatisticStatistic FormulaFormula Excel FormulaExcel Formula ProPro ConCon

ModeMode

Most frequently occurring data value

=MODE(Data)

Useful for attribute data or discrete data with a small range.

May not be unique, and is not helpful for continuous data.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Six Measures of Central TendencySix Measures of Central Tendency

MidrangeMidrange=0.5*(MIN(Data)

+MAX(Data))

Easy to Easy to understand understand and and calculate.calculate.

Influenced Influenced by extreme by extreme values and values and ignores ignores most data most data values.values.

min max

2

x x

4A-12

StatisticStatistic FormulaFormula Excel FormulaExcel Formula ProPro ConCon

Geometric Geometric mean (mean (GG))

=GEOMEAN(Data)

Useful for growth rates and mitigates high extremes.

Less familiar and requires positive data.

Trimmed Trimmed meanmean

Same as the mean except omit highest and lowest k% of data values (e.g., 5%)

=TRIMMEAN(Data, Percent)

Mitigates effects of extreme values.

Excludes some data values that could be relevant.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Six Measures of Central TendencySix Measures of Central Tendency

1 2 ... nnx x x

4A-13

• A familiar measure of central tendency.A familiar measure of central tendency.

• In Excel, use function =AVERAGE(Data) where In Excel, use function =AVERAGE(Data) where Data is an array of data values.Data is an array of data values.

Population Formula Sample Formula

1

N

ii

x

N

1

n

ii

xx

n

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MeanMean

4A-14

• For the sample of For the sample of nn = 37 car brands: = 37 car brands:

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MeanMean

4A-15

• Arithmetic mean is the most familiar average.Arithmetic mean is the most familiar average.

• Affected by every sample item.Affected by every sample item.

• The balancing point or fulcrum for the data.The balancing point or fulcrum for the data.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Characteristics of the MeanCharacteristics of the Mean

4A-16

• Regardless of the shape of the distribution, Regardless of the shape of the distribution, absolute distances from the mean to the data absolute distances from the mean to the data points always sum to zero.points always sum to zero.

1

( ) 0n

ii

x x

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Characteristics of the MeanCharacteristics of the Mean

• Consider the following Consider the following asymmetric distribution of asymmetric distribution of quiz scores whose mean = 65.quiz scores whose mean = 65.

1

( )n

ii

x x

= (42 – 65) + (60 – 65) + (70 – 65) + (75 – 65) + (78 – 65)= (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 04A-17

• The The medianmedian ( (MM) is the 50) is the 50thth percentile or percentile or midpoint of the midpoint of the sortedsorted sample data. sample data.

• MM separates the upper and lower half of the separates the upper and lower half of the sorted observations.sorted observations.

• If If nn is odd, the median is the middle is odd, the median is the middle observation in the data array.observation in the data array.

• If If nn is even, the median is the average of the is even, the median is the average of the middle two observations in the data array.middle two observations in the data array.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MedianMedian

4A-18

• Consider the following Consider the following nn = 6 data values: = 6 data values:11 12 15 17 21 3211 12 15 17 21 32

• What is the median?What is the median?

M = (x3+x4)/2 = (15+17)/2 = 16

11 12 15 1616 17 21 32

For even n, Median = / 2 ( / 2 1)

2n nx x

n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MedianMedian

4A-19

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Median Median (Figure 4.6)

• For For nn = 8, the median is between the fourth and fifth = 8, the median is between the fourth and fifth observations in the data array.observations in the data array.

4A-20

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MedianMedian

• For For nn = 9, the median is the fifth observation in the = 9, the median is the fifth observation in the data array.data array.

4A-21

• Consider the following Consider the following nn = 7 data values: = 7 data values:12 23 23 25 27 34 4112 23 23 25 27 34 41

• What is the median?What is the median?

M = x4 = 25

12 23 23 2525 27 34 41

For odd n, Median = ( 1) / 2nx

(n+1)/2 = (7+1)/2 = 8/2 = 4

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MedianMedian

4A-22

• Use Excel’s function =MEDIAN(Data) where Use Excel’s function =MEDIAN(Data) where Data is an array of data values.Data is an array of data values.

• For the 37 vehicle quality ratings (odd For the 37 vehicle quality ratings (odd nn) the ) the position of the median is position of the median is ((nn+1)/2 = (37+1)/2 = 19.+1)/2 = (37+1)/2 = 19.

• So, the median is So, the median is xx1919 = 121. = 121.

• When there are several duplicate data values, When there are several duplicate data values, the median does not provide a clean “50-50” the median does not provide a clean “50-50” split in the data.split in the data.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MedianMedian

4A-23

• The median is insensitive to extreme data values.The median is insensitive to extreme data values.

• For example, consider the following quiz scores for For example, consider the following quiz scores for 3 students:3 students:

Tom’s scores:Tom’s scores: 20, 40, 70, 75, 80 20, 40, 70, 75, 80 Mean =57, Mean =57, Median = 70Median = 70, Total = 285, Total = 285Jake’s scores:Jake’s scores: 60, 65, 70, 90, 95 60, 65, 70, 90, 95 Mean = 76, Mean = 76, Median = 70Median = 70, Total = 380, Total = 380Mary’s scores:Mary’s scores: 50, 65, 70, 75, 90 50, 65, 70, 75, 90 Mean = 70, Mean = 70, Median = 70Median = 70, Total = 350, Total = 350

• What does the median for each student tell you?What does the median for each student tell you?

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Characteristics of the MedianCharacteristics of the Median

4A-24

• The most frequently occurring data value.The most frequently occurring data value.

• Similar to mean and median if data values Similar to mean and median if data values occur often near the center of sorted data.occur often near the center of sorted data.

• May have multiple modes or no mode. May have multiple modes or no mode.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

ModeMode

4A-25

Lee’s scores:Lee’s scores: 60, 70, 70, 70, 80 60, 70, 70, 70, 80 Mean =70, Median = 70, Mean =70, Median = 70, Mode = 70Mode = 70Pat’s scores:Pat’s scores: 45, 45, 70, 90, 100 45, 45, 70, 90, 100 Mean = 70, Median = 70, Mean = 70, Median = 70, Mode = 45Mode = 45Sam’s scores:Sam’s scores: 50, 60, 70, 80, 90 50, 60, 70, 80, 90 Mean = 70, Median = 70, Mean = 70, Median = 70, Mode = noneMode = noneXiao’s scores:Xiao’s scores: 50, 50, 70, 90, 90 50, 50, 70, 90, 90 Mean = 70, Median = 70, Mean = 70, Median = 70, Modes = 50,90Modes = 50,90

Central TendencyCentral TendencyCentral TendencyCentral Tendency

ModeMode• For example, consider the following quiz For example, consider the following quiz

scores for 3 students:scores for 3 students:

• What does the mode for each student tell you?What does the mode for each student tell you?4A-26

• Easy to define, not easy to calculate in large Easy to define, not easy to calculate in large samples.samples.

• Use Excel’s function =MODE(Array)Use Excel’s function =MODE(Array)- will return #N/A if there is no mode.- will return #N/A if there is no mode.- will return first mode found if multimodal.- will return first mode found if multimodal.

• May be far from the middle of the distribution May be far from the middle of the distribution and not at all typical.and not at all typical.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

ModeMode

4A-27

• Generally isn’t useful for continuous data Generally isn’t useful for continuous data since data values rarely repeat.since data values rarely repeat.

• Best for attribute data or a discrete variable Best for attribute data or a discrete variable with a small range (e.g., Likert scale).with a small range (e.g., Likert scale).

Central TendencyCentral TendencyCentral TendencyCentral Tendency

ModeMode

4A-28

• Consider the following Consider the following P/EP/E ratios for a random ratios for a random sample of 68 Standard & Poor’s 500 stocks.sample of 68 Standard & Poor’s 500 stocks.

• What is the mode?What is the mode?

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Example: Price/Earnings Ratios and ModeExample: Price/Earnings Ratios and Mode

7 8 8 10 10 10 10 12 13 13 13 13 13 13 13 14 14

14 15 15 15 15 15 16 16 16 17 18 18 18 18 19 19 19

19 19 20 20 20 21 21 21 22 22 23 23 23 24 25 26 26

26 26 27 29 29 30 31 34 36 37 40 41 45 48 55 68 91

4A-29

• Excel’s descriptive Excel’s descriptive statistics results are:statistics results are:

• The mode 13 occurs 7 times, but what does the dot plot show?

Mean 22.7206

Median 19

Mode 13

Range 84

Minimum 7

Maximum 91

Sum 1545

Count 68

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Example: Price/Earnings Ratios and ModeExample: Price/Earnings Ratios and Mode

4A-30

• The dot plot shows local modes (a peak with The dot plot shows local modes (a peak with valleys on either side) at 10, 13, 15, 19, 23, 26, valleys on either side) at 10, 13, 15, 19, 23, 26, 29.29.

• These multiple modes suggest that the mode These multiple modes suggest that the mode is not a stable measure of central tendency.is not a stable measure of central tendency.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Example: Price/Earnings Ratios and ModeExample: Price/Earnings Ratios and Mode

4A-31

• Points scored by the winning NCAA football Points scored by the winning NCAA football team tends to have modes in multiples of 7 team tends to have modes in multiples of 7 because each touchdown yields 7 points.because each touchdown yields 7 points.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Example: Rose Bowl Winners’ PointsExample: Rose Bowl Winners’ Points

• Consider the dot plot of the points scored by Consider the dot plot of the points scored by the winning team in the first 87 Rose Bowl the winning team in the first 87 Rose Bowl games.games.

• What is the mode?What is the mode?4A-32

• A A bimodal distributionbimodal distribution refers to the shape of the refers to the shape of the histogram rather than the mode of the raw data.histogram rather than the mode of the raw data.

• Occurs when dissimilar populations are combined Occurs when dissimilar populations are combined in one sample. For example,in one sample. For example,

Central TendencyCentral TendencyCentral TendencyCentral Tendency

ModeMode

4A-33

• Compare mean and median or look at Compare mean and median or look at histogram to determine degree of skew ness.histogram to determine degree of skew ness.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Skew nessSkew ness

4A-34

Distribution’s Distribution’s ShapeShape

Histogram AppearanceHistogram Appearance StatisticsStatistics

Skewed leftSkewed left(negative (negative skew ness)skew ness)

Long tail of histogram points left(a few low values but most data on right)

Mean < MedianMean < Median

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Symptoms of Skew nessSymptoms of Skew ness

SymmetricSymmetricTails of histogram are balanced (low/high values offset)

Mean Mean MedianMedian

Skewed rightSkewed right(positive (positive skew ness)skew ness)

Long tail of histogram points right(most data on left but a few high values)

Mean > MedianMean > Median

4A-35

• For the sample of spending per customer at 74 For the sample of spending per customer at 74 Noodles &, the mean ($7.04) exceeds the Noodles &, the mean ($7.04) exceeds the median ($7.00). What does this suggest?median ($7.00). What does this suggest?

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Skew nessSkew ness

4A-36

• The The geometric meangeometric mean (G) is a (G) is a multiplicative average.multiplicative average.

• For the J. D. Power quality data (n=37):For the J. D. Power quality data (n=37):

1 2 ... nnG x x x

37 7737 (87)(93)(98)...(164)(173) 2.37667 10 123.38G

• In Excel use =GEOMEAN(Array)In Excel use =GEOMEAN(Array)

• The geometric mean tends to mitigate the The geometric mean tends to mitigate the effects of high outliers.effects of high outliers.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Geometric MeanGeometric Mean

4A-37

• A variation on the geometric mean used to A variation on the geometric mean used to find the average find the average growth rategrowth rate for a time series. for a time series.

• For example, from For example, from 2002 to 2006, 2002 to 2006, JetBlue Airlines JetBlue Airlines revenues are:revenues are:

1

1nnx

Gx

Year Revenue (mil)

2002 635

2003 998

2004 1265

2005 1701

2006 2363

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Growth RatesGrowth Rates

4A-38

• The average growth rate is given by taking the The average growth rate is given by taking the geometric mean of the ratios of each year’s geometric mean of the ratios of each year’s revenue to the preceding year.revenue to the preceding year.

• Due to cancellations, only the first and last Due to cancellations, only the first and last years are relevant:years are relevant:

= 1.3891 = .389 or 38.9% per year

• In Excel use =(2363/635)^(1/4)-1In Excel use =(2363/635)^(1/4)-1

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Growth RatesGrowth Rates

G

9 9 8

6 3 5

1 2 6 5

9 9 8

1 7 0 1

1 2 6 5

2 3 6 3

1 7 0 11

2 3 6 3

6 3 514 4

4A-39

• The The midrangemidrange is the point halfway between the is the point halfway between the lowest and highest values of X.lowest and highest values of X.

• Easy to use but sensitive to extreme data values.Easy to use but sensitive to extreme data values.min max

2

x xMidrange =

• For the J. D. Power quality data (n=37):For the J. D. Power quality data (n=37):

min max

2

x xMidrange = =

• Here, the midrange (147.5) is higher than the mean Here, the midrange (147.5) is higher than the mean (134.51) or median (132).(134.51) or median (132).

Central TendencyCentral TendencyCentral TendencyCentral Tendency

MidrangeMidrange

9 1 2 0 4

21 4 7 5

.

4A-40

• To calculate the To calculate the trimmed meantrimmed mean, first remove the , first remove the highest and lowest highest and lowest kk percent of the observations. percent of the observations.

• For example, for the For example, for the nn = 68 P/E ratios, we want a 5 = 68 P/E ratios, we want a 5 percent trimmed mean (i.e., percent trimmed mean (i.e., kk = .05). = .05).

• To determine how many observations to trim, To determine how many observations to trim, multiply multiply kk x x nn = 0.05 x 68 = 3.4 or 3 observations. = 0.05 x 68 = 3.4 or 3 observations.

• So, we would remove the three smallest and three So, we would remove the three smallest and three largest observations before averaging the remaining largest observations before averaging the remaining values.values.

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Trimmed MeanTrimmed Mean

4A-41

• Here is a summary of all the measures of Here is a summary of all the measures of central tendency for the central tendency for the nn = 68 P/E values. = 68 P/E values.

• The trimmed mean mitigates the effects of very high The trimmed mean mitigates the effects of very high values, but still exceeds the median.values, but still exceeds the median.

Mean:Mean: 22.72 =AVERAGE(PERatio)

Median:Median: 19.00 =MEDIAN(PERatio)

Mode:Mode: 13.00 =MODE(PERatio)

Geometric Geometric Mean:Mean: 19.85 =GEOMEAN(PERatio)

Midrange:Midrange: 49.00 (MIN(PERatio)+MAX(PERatio))/2

5% Trim Mean:5% Trim Mean: 21.10 =TRIMMEAN(PERatio,0.1)

Central TendencyCentral TendencyCentral TendencyCentral Tendency

Trimmed MeanTrimmed Mean

4A-42

Central TendencyCentral Tendency

Trimmed MeanTrimmed Mean

• The Federal Reserve uses a 16% trimmed mean to mitigate the effects of extremes in its analysis of the Consumer Price Index.

4A-43

• VariationVariation is the “spread” of data points about is the “spread” of data points about the center of the distribution in a sample. the center of the distribution in a sample. Consider the following measures of dispersion:Consider the following measures of dispersion:

StatisticStatistic FormulaFormula ExcelExcel ProPro ConCon

RangeRange xmax – xmin=MAX(Data)-

MIN(Data)Easy to calculate

Sensitive to extreme data values.

DispersionDispersionDispersionDispersion

Variance Variance (s(s22))

=VAR(Data)Plays a key role in mathematical statistics.

Non-intuitive meaning.

2

1

1

n

ii

x x

n

Measures of VariationMeasures of Variation

4A-44

StatisticStatistic FormulaFormula ExcelExcel ProPro ConCon

Standard Standard deviation deviation ((ss))

=STDEV(Data)

Most common measure. Uses same units as the raw data ($ , £, ¥, etc.).

Non-intuitive meaning.

2

1

1

n

ii

x x

n

DispersionDispersionDispersionDispersion

Measures of VariationMeasures of Variation

Coef-Coef-ficient. officient. ofvariation variation ((CVCV))

None

Measures relative variation in percent so can compare data sets.

Requires non-negative data.

100s

x

4A-45

StatisticStatistic FormulaFormula ExcelExcel ProPro ConCon

Mean absolute deviation (MAD)

=AVEDEV(Data)Easy to understand.

Lacks “nice” theoretical properties.

DispersionDispersionDispersionDispersion

Measures of VariationMeasures of Variation

1

n

ii

x x

n

4A-46

• The difference between the largest and The difference between the largest and smallest observation.smallest observation.

Range = xmax – xmin

• For example, for the For example, for the nn = 68 P/E ratios, = 68 P/E ratios,

Range = 91 – 7 = 84

DispersionDispersionDispersionDispersion

RangeRange

4A-47

• The population variancepopulation variance (2) is defined as the sum of squared deviations around the mean divided by the population size.

• For the sample variancesample variance (s2), we divide by n – 1 instead of n, otherwise s2 would tend to underestimate the unknown population variance 2.

2

2 1

N

ii

x

N

2

2 1

1

n

ii

x xs

n

DispersionDispersionDispersionDispersion

VarianceVariance

4A-48

• The square root of the variance.The square root of the variance.

• Units of measure are the same as Units of measure are the same as XX..

Population Population standard standard deviationdeviation

2

1

N

ii

x

N

Sample Sample

standard standard deviationdeviation

2

1

1

n

ii

x xs

n

• Explains how individual values in a data set Explains how individual values in a data set vary from the mean.vary from the mean.

DispersionDispersionDispersionDispersion

Standard DeviationStandard Deviation

4A-49

• Excel’s built in functions areExcel’s built in functions are

StatisticStatistic Excel Excel populationpopulation formulaformula

Excel Excel sample sample formulaformula

VarianceVariance =VARP(Array) =VAR(Array)

Standard deviationStandard deviation =STDEVP(Array) =STDEV(Array)

DispersionDispersionDispersionDispersion

Standard DeviationStandard Deviation

4A-50

• Consider the following five quiz scores for Consider the following five quiz scores for Stephanie. Stephanie. (Table 4.12)

DispersionDispersionDispersionDispersion

Calculating a Standard DeviationCalculating a Standard Deviation

4A-51

• Now, calculate the sample standard deviation:Now, calculate the sample standard deviation:

2

1 2380595 24.39

1 5 1

n

ii

x xs

n

• Somewhat easier, the Somewhat easier, the two-sum formulatwo-sum formula can can also be used:also be used:

2

212

2 1

(360)28300 28300 259205 595 24.39

1 5 1 5 1

n

ini

ii

x

xns

n

DispersionDispersionDispersionDispersion

Calculating a Standard DeviationCalculating a Standard Deviation

4A-52

• The standard deviation is nonnegative because The standard deviation is nonnegative because deviations around the mean are squared.deviations around the mean are squared.

• When every observation is exactly equal to the When every observation is exactly equal to the mean, the standard deviation is zero.mean, the standard deviation is zero.

• Standard deviations can be large or small, Standard deviations can be large or small, depending on the units of measure.depending on the units of measure.

• Compare standard deviations Compare standard deviations onlyonly for data for data sets measured in the same units and only if sets measured in the same units and only if the means do not differ substantially.the means do not differ substantially.

DispersionDispersionDispersionDispersion

Calculating a Standard DeviationCalculating a Standard Deviation

4A-53

• Useful for comparing variables measured in Useful for comparing variables measured in different units or with different means.different units or with different means.

• A unit-free measure of dispersionA unit-free measure of dispersion

• Expressed as a percent of the mean.Expressed as a percent of the mean.

• Only appropriate for nonnegative data. It is Only appropriate for nonnegative data. It is undefined if the mean is zero or negative.undefined if the mean is zero or negative.

100s

CVx

DispersionDispersionDispersionDispersion

Coefficient of VariationCoefficient of Variation

4A-54

• For example:For example:

Defect rates Defect rates

((nn = 37) = 37)

s = 22.89= 125.38 gives CV = 100 × (22.89)/(125.38) = 18%

ATM ATM deposits deposits ((nn = 100) = 100)

s = 280.80= 233.89 gives CV = 100 × (280.80)/(233.89) =

120%

P/E ratios P/E ratios ((nn = 68) = 68)

s = 14.28= 22.72 gives CV = 100 × (14.08)/(22.72) = 62%

x

x

x

100s

CVx

DispersionDispersionDispersionDispersion

Coefficient of VariationCoefficient of Variation

4A-55

• The Mean Absolute DeviationMean Absolute Deviation (MAD) reveals the average distance from an individual data point to the mean (center of the distribution).

• Uses absolute values of the deviations around the mean.

• Excel’s function is =AVEDEV(Array)

1

n

ii

x xMAD

n

DispersionDispersionDispersionDispersion

Mean Absolute DeviationMean Absolute Deviation

4A-56

• Consider the histograms of hole diameters Consider the histograms of hole diameters drilled in a steel plate during manufacturing.drilled in a steel plate during manufacturing.

• The desired distribution is outlined in red.The desired distribution is outlined in red.

DispersionDispersionDispersionDispersion

Machine AMachine A Machine BMachine B

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Manufacturing Manufacturing

4A-57

Desired mean (5mm) but too much variation.

Acceptable variation but mean is less than 5 mm.

• Take frequent samples to monitor quality.Take frequent samples to monitor quality.

Machine AMachine A Machine BMachine B

DispersionDispersionDispersionDispersion

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Manufacturing Manufacturing

4A-58

• Consider student ratings of four professors on Consider student ratings of four professors on eight teaching attributes (10-point scale).eight teaching attributes (10-point scale).

DispersionDispersionDispersionDispersion

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Job Performance Job Performance

4A-59

• Jones and Wu have identical means but Jones and Wu have identical means but different standard deviations.different standard deviations.

DispersionDispersionDispersionDispersion

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Job Performance Job Performance

4A-60

• Smith and Gopal have different means but Smith and Gopal have different means but identical standard deviations.identical standard deviations.

DispersionDispersionDispersionDispersion

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Job Performance Job Performance

4A-61

• A high mean (better rating) and low standard A high mean (better rating) and low standard deviation (more consistency) is preferred. deviation (more consistency) is preferred. Which professor do you think is best?Which professor do you think is best?

DispersionDispersionDispersionDispersion

Central Tendency vs. Dispersion: Central Tendency vs. Dispersion: Job Performance Job Performance

4A-62

Applied Statistics in Applied Statistics in Business & EconomicsBusiness & Economics

End of Chapter 4AEnd of Chapter 4A

4A-63

top related