measures of dispersion or measures of variability
TRANSCRIPT
Measures of Dispersion Measures of Dispersion
oror
Measures of VariabilityMeasures of Variability
Measures of VariabilityMeasures of Variability
A single summary figure that describes A single summary figure that describes the spread of observations within a the spread of observations within a distribution.distribution.
MEASURES OF DESPERSIONMEASURES OF DESPERSION
RANGERANGE INTERQUARTILE RANGEINTERQUARTILE RANGE VARIANCEVARIANCE STANDARD DEVIATIONSTANDARD DEVIATION
Measures of VariabilityMeasures of Variability
RangeRange Difference between the smallest and largest Difference between the smallest and largest
observations.observations. Interquartile RangeInterquartile Range
Range of the middle half of scores.Range of the middle half of scores. VarianceVariance
Mean of all squared deviations from the mean.Mean of all squared deviations from the mean. Standard DeviationStandard Deviation
Measure of the average amount by which Measure of the average amount by which observations deviate from the mean. The square root observations deviate from the mean. The square root of the variance.of the variance.
Variability Example: Variability Example: RangeRange
Las Vegas Hotel RatesLas Vegas Hotel Rates
52, 76, 100, 136, 186, 196, 205, 150, 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891750, 791, 891
Range:Range: 891-52 = 839 891-52 = 839
Pros and Cons of the Pros and Cons of the RangeRange
ProsPros Very easy to Very easy to
compute.compute. Scores exist in the Scores exist in the
data set.data set.
ConsCons Value depends only Value depends only
on two scores.on two scores. Very sensitive to Very sensitive to
outliers.outliers. Influenced by sample Influenced by sample
size (the larger the size (the larger the sample, the larger the sample, the larger the range).range).
Inter quartile RangeInter quartile Range
The inter quartile range is QThe inter quartile range is Q33-Q-Q11
50% of the observations in the 50% of the observations in the distribution are in the inter quartile range.distribution are in the inter quartile range.
The following figure shows the interaction The following figure shows the interaction between the quartiles, the median and between the quartiles, the median and the inter quartile range.the inter quartile range.
Inter quartile RangeInter quartile Range
Quartiles: 1 = Q n+1
th4
2 = Q 2(n+1) n+1 = th
4 2
3 = Q 3(n+1) th
4Inter quartile : IQR = Q3 – Q1
Pros and Cons of the Pros and Cons of the Interquartile RangeInterquartile Range
ProsPros Fairly easy to Fairly easy to
compute.compute. Scores exist in the Scores exist in the
data set.data set. Eliminates influence Eliminates influence
of extreme scores.of extreme scores.
ConsCons Discards much of the Discards much of the
data.data.
Percentiles and QuartilesPercentiles and Quartiles
Maximum is 100th percentile: 100% of Maximum is 100th percentile: 100% of values lie at or below the maximumvalues lie at or below the maximum
Median is 50th percentile: 50% of values Median is 50th percentile: 50% of values lie at or below the medianlie at or below the median
Any percentile can be calculated. But the Any percentile can be calculated. But the most common are 25most common are 25thth (1 (1stst Quartile) and Quartile) and 7575thth (3 (3rdrd Quartile) Quartile)
Locating Percentiles in a Locating Percentiles in a Frequency DistributionFrequency Distribution
A percentile is a score below which a specific A percentile is a score below which a specific percentage of the distribution falls.percentage of the distribution falls.
The 75th percentile is a score below which 75% of The 75th percentile is a score below which 75% of the cases fall.the cases fall.
The median is the 50th percentile: 50% of the cases The median is the 50th percentile: 50% of the cases fall below itfall below it
Another type of percentile :The lower quartile is Another type of percentile :The lower quartile is 25th percentile and the upper quartile is the 75th 25th percentile and the upper quartile is the 75th percentilepercentile
NUMBER OF CHILDREN
260 26.6 26.6 26.6
161 16.4 16.5 43.1
260 26.6 26.6 69.7
155 15.8 15.9 85.6
70 7.2 7.2 92.7
31 3.2 3.2 95.9
21 2.1 2.1 98.1
11 1.1 1.1 99.2
8 .8 .8 100.0
977 99.8 100.0
2 .2
979 100.0
0
1
2
3
4
5
6
7
EIGHT OR MORE
Total
Valid
NAMissing
Total
Frequency PercentValid
PercentCumulative
Percent
50th percentile
80th percentile
50% included here
80% included
here
25th percentile
25% included here
Locating Percentiles in a Frequency Distribution
Five Number SummaryFive Number Summary
Minimum ValueMinimum Value 1st Quartile1st Quartile MedianMedian 3rd Quartile3rd Quartile Maximum ValueMaximum Value
VARIANCE:VARIANCE:
Deviations of each observation from Deviations of each observation from the mean, then averaging the sum of the mean, then averaging the sum of squares of these deviations.squares of these deviations.
STANDARD DEVIATION:STANDARD DEVIATION:
“ “ ROOT- MEANS-SQUARE-DEVIATIONS”ROOT- MEANS-SQUARE-DEVIATIONS”
VarianceVariance
The average amount that a score deviates from The average amount that a score deviates from the typical score.the typical score. Score – Mean = Difference ScoreScore – Mean = Difference Score Average of Difference Scores = 0Average of Difference Scores = 0 In order to make this number not 0, square the In order to make this number not 0, square the
difference scores ( negatives to become positives).difference scores ( negatives to become positives).
Variance: Variance: Computational FormulaComputational Formula
PopulationPopulation SampleSample
2
222
)(
N
XXN
2
222
)(
n
XXnS
VarianceVariance
Use the computational formula to calculate the variance.Use the computational formula to calculate the variance.
X X2
3 94 164 164 166 367 497 498 648 649 81
Sum: 60 Sum: 4002
222
)(
n
XXnS
0.4
100
3600400010
)60()400(10
2
2
2
22
S
S
S
Variability Example: Variability Example: VarianceVariance
Las Vegas Hotel RatesLas Vegas Hotel Rates
88.44760
1225
17918499623401707035
)13386()6686202(35
2
2
2
22
S
S
S
2
222
)(
n
XXnS
X X2
472 222784303 91809280 78400282 79524417 173889400 160000254 64516205 42025384 147456264 69696317 10048976 5776
643 413449480 230400136 18496250 62500100 10000732 535824317 100489264 69696384 147456750 562500402 161604422 178084373 139129325 105625313 97969749 561001791 625681196 38416891 793881283 8008952 2704
186 34596693 480249
Sum: 13386 Sum: 6686202
Pros and Cons of Pros and Cons of VarianceVariance
ProsPros Takes all data into Takes all data into
account.account. Lends itself to Lends itself to
computation of other computation of other stable measures (and stable measures (and is a prerequisite for is a prerequisite for many of them). many of them).
ConsCons Hard to interpret.Hard to interpret. Can be influenced by Can be influenced by
extreme scores.extreme scores.
Standard DeviationStandard Deviation
To “undo” the squaring of difference To “undo” the squaring of difference scores, take the square root of the scores, take the square root of the variance.variance.
Return to original units rather than Return to original units rather than squared units.squared units.
Quantifying UncertaintyQuantifying Uncertainty
Standard deviationStandard deviation: measures the : measures the variation of a variable in the sample.variation of a variable in the sample. Technically, Technically,
s x xN ii
N
1
12
1
( )
Standard DeviationStandard Deviation
PopulationPopulation SampleSample
Measure of the average amount by which observations deviate on either side of the mean. The square root of the variance.
2 s s2
(X )N
2 2)(
n
XXS
N X2 ( X )
N2
22
2
2 )(
n
XXnS
Variability Example: Variability Example: Standard DeviationStandard Deviation
Mean: 6
Standard Deviation: 20.2
0.4
100
36004000
10
)60()400(102
2
S
S
S
S
0.210
40
10
)69()68()68()67()67()66()64()64()64()63(
)(
2222222222
2
S
S
n
XXS
2
2
2 )(
n
XXnS
Variability Example: Variability Example: Standard DeviationStandard Deviation
Las Vegas Hotel Rates
0
1
2
3
4
5
6
7
8
9
0-99
100-199
200-299
300-399
400-499
500-599
600-699
700-799
800-899
Rates
Fre
quen
cy
hotel rates
Mean: $371.60
Standard Deviation: 57.211$88.44760
1225
179184996234017070
35
)13386()6686202(352
2
S
S
S
Pros and Cons of Pros and Cons of Standard DeviationStandard Deviation
ProsPros Lends itself to computation Lends itself to computation
of other stable measures of other stable measures (and is a prerequisite for (and is a prerequisite for many of them).many of them).
Average of deviations Average of deviations around the mean.around the mean.
Majority of data within one Majority of data within one standard deviation above or standard deviation above or below the mean.below the mean.
ConsCons Influenced by extreme Influenced by extreme
scores.scores.
Mean and Standard Mean and Standard DeviationDeviation
Using the mean and standard deviation Using the mean and standard deviation together:together: Is an efficient way to describe a distribution with just Is an efficient way to describe a distribution with just
two numbers.two numbers. Allows a direct comparison between distributions Allows a direct comparison between distributions
that are on different scales.that are on different scales.
A 100 samples were selected. Each of the sample contained 100 normal individuals. The mean Systolic BP of each sample is presented
110, 120, 130, 90, 100, 140, 160, 100, 120, 120, 110, 130, etc
Systolic BP level No. of samples90 - 5
100 - 10 110 - 20 120 - 34 130 - 20 140 - 10 150 - 5 160 - 2
Mean = 120Sd., = 10
•The curve describes probability of getting any range of values ie., P(x>120), P(x<100), P(110 <X<130) •Area under the curve = probability•Area under the whole curve =1•Probability of getting specific number =0, eg P(X=120) =0
Normal Distribution
120110100 130 140 15090
Mean = 120SD = 10
15 30 45 60 75
95% Confidence Interval for Mu
28 33 38 43
95% Confidence Interval for Median
Variable: Age
A-Squared:P-Value:
MeanStDevVarianceSkewnessKurtosisN
Minimum1st QuartileMedian3rd QuartileMaximum
32.3851
13.3380
28.0000
0.9620.014
36.450015.7356247.608
0.6796268.51E-02
60
11.000025.000031.500046.750079.0000
40.5149
19.1921
42.0000
Anderson-Darling Normality Test
95% Confidence Interval for Mu
95% Confidence Interval for Sigma
95% Confidence Interval for Median
Descriptive Statistics
WHICH MEASURE TO USE ?WHICH MEASURE TO USE ?
DISTRIBUTION OF DATA IS SYMMETRICDISTRIBUTION OF DATA IS SYMMETRIC
---- USE MEAN & S.D.,---- USE MEAN & S.D.,
DISTRIBUTION OF DATA IS SKEWEDDISTRIBUTION OF DATA IS SKEWED
---- USE MEDIAN & QUARTILES---- USE MEDIAN & QUARTILES
ANY QUESTIONSANY QUESTIONS
THANK YOUTHANK YOU