measures of variability · measures of central tendency alone cannot completely characterize a set...
TRANSCRIPT
![Page 1: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/1.jpg)
MEASURES OF
VARIABILITY
Variance and Standard Deviation
![Page 2: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/2.jpg)
Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency.
Measures of variability or dispersion are used to describe the spread, or variability, of a distribution
Common measures of dispersion: range, variance, and standard deviation
Measures of Variability or
Dispersion
2
![Page 3: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/3.jpg)
Why Study Dispersion?
For example, if your nature guide told you that
the river ahead averaged 3 feet in depth,
would you want to wade across on foot without
additional information? Probably not. You
would want to know something about the
variation in the depth.
A second reason for studying the dispersion
in a set of data is to compare the spread in two
or more distributions.
3
![Page 4: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/4.jpg)
Observe two hypothetical
data sets:
The average value provides
a good representation of the
observations in the data set.
Small variability
This data set is now
changing to...
Why Study Dispersion?
4
![Page 5: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/5.jpg)
Observe two hypothetical
data sets:
The average value provides
a good representation of the
observations in the data set.
Small variability
Larger variability
The same average value does not
provide as good representation of the
observations in the data set as before.
Why Study Dispersion?
5
![Page 6: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/6.jpg)
Samples of Dispersions
6
![Page 7: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/7.jpg)
Measures of Dispersion
Range
Mean Deviation
Variance and
Standard Deviation
7
![Page 8: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/8.jpg)
EXAMPLE – Range
The number of cappuccinos sold at the Starbucks location in the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50, 60, and 80. Determine the mean deviation for the number of cappuccinos sold.
Range = Largest – Smallest value
= 80 – 20 = 60
8
![Page 9: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/9.jpg)
range H L
Range: The difference in value between the highest-
valued (H) and the lowest-valued (L) pieces of data:
Example: Consider the sample {12, 23, 17, 15, 18}.
Find the range
Solutions:
range H L 23 12 11
Range
9
![Page 10: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/10.jpg)
Skor (X)
Kekerapan (f )
fX
50
1
50
58
1
58
63
2
126
65
3
195
67
2
134
74
5
370
78
2
156
80
2
160
86
1
86
89
1
89
N = f = 20
fX= 1424
Ungrouped Data
10
![Page 11: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/11.jpg)
Range for Ungrouped Data
R = H – L H = highest value
L = lowest value
R = 89 – 50 = 39
Range for Ungrouped Data
11
![Page 12: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/12.jpg)
For Grouped Data Class
Interval
Class Limit Class Mid-
point (m)
Frequency (less
than UCL) (f)
mf
50 – 54
49.5 – 54.5
52
1
52
55 – 59
54.5 – 59.5
57
1
57
60 – 64
59.5 – 64.5
62
2
124
65 – 69
64.5 – 69.5
67
5
335
70 – 74
69.5 – 74.5
72
5
360
75 – 79
74.5 – 79.5
77
2
154
80 – 85
79.5 – 84.5
82
2
164
85 – 89
84.5 – 89.5
87
2
174
N = f = 20
mf =
1420
12
![Page 13: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/13.jpg)
Range for Grouped Data
For Grouped Data in a Frequency Distribution
Range = UCL of Highest Class – LCL of Lowest Class
Range = 89.5 – 49.5 = 40
Range for Grouped Data
13
![Page 14: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/14.jpg)
The range of a set of observations is the
difference between the largest and
smallest observations.
Its major advantage is the ease with which it can be computed.
Its major shortcoming is its failure to
provide information on the dispersion of the
observations between the two end points.
? ? ?
But, how do all the observations spread out?
Smallest
observation
Largest
observation
The range cannot assist in answering this question Range
Range
14
![Page 15: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/15.jpg)
Semi Inter-Quartile Range
There are three quartiles:
First Quartile, Q1, at the 25% frequency
Second Quartile, Q2, at the 50% frequency, equals the median
Third Quartile, Q3, at the 75% frequency
Semi Inter-Quartile Range, Q = ( Q3 - Q1 )/2
Semi Inter-Quartile Range
15
![Page 16: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/16.jpg)
Semi Inter-Quartile Range
Semi Inter-Quartile Range, Q = ( Q3 - Q1 )/2
Q = (77.0 – 65.5) = 5.75
2
Semi Inter-Quartile Range
16
![Page 17: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/17.jpg)
• Other measures of dispersion are based on the following
quantity
Deviation from the Mean: A deviation from the
mean, ,is the difference between the value of x and
the mean .
x xx
Deviation from the Mean
17
![Page 18: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/18.jpg)
Example: Consider the sample {12, 23, 17, 15, 18}.
Find each deviation from the mean.
Solutions:
-5
6
0
-2
1
Data Deviation from Mean
_________________________
12
23
17
15
18
x x x
x 1
512 23 17 15 18 17( )
Deviation from the Mean
18
![Page 19: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/19.jpg)
Consider two small populations:
10 9 8
7 4 10
11 12
13 16
8-10= -2
9-10= -1
11-10= +1
12-10= +2
4-10 = - 6
7-10 = -3
13-10 = +3
16-10 = +6
Sum = 0
Sum = 0
The mean of both
populations is 10...
…but measurements in B
are more dispersed
then those in A.
A measure of dispersion
Should agrees with this
observation.
Can the sum of deviations
Be a good measure of dispersion?
A
B
The sum of deviations is
zero for both populations,
therefore, is not a good
measure of dispersion.
Why not use the sum of deviations
from the mean?
19
![Page 20: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/20.jpg)
Mean Deviation
The number of cappuccinos sold at the Starbucks location in the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50, 60, and 80. Determine the mean deviation for the number of cappuccinos sold.
20
![Page 21: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/21.jpg)
This measure reflects the dispersion of all the
observations
The variance of a population of size N x1, x2,…,xN
whose mean is m is defined as
The variance of a sample of n observations
x1, x2, …,xn whose mean is is defined as x
N
)x( 2i
N1i2 m
1n
)xx(s
2i
n1i2
The Variance
21
![Page 22: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/22.jpg)
Population Variance, 2 = (x - m)2 ,
N
m = population mean
N = population size
Population Standard Deviation, = 2
Population Variance
22
![Page 23: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/23.jpg)
Sample Variance: The sample variance, s2, is the mean
of the squared deviations, calculated using n 1 as the
divisor:
s n
x x 2 2 1
1
( ) where n is the sample size
s xn
2
1
SS( ) ( ) SS ( ) ( ) x x x x
n x 2 2 2 1
Note: The numerator for the sample variance is called the
sum of squares for x, denoted SS(x):
where
Sample Variance
23
![Page 24: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/24.jpg)
Let us calculate the variance of the two populations
185
)1016()1013()1010()107()104( 222222B
25
)1012()1011()1010()109()108( 222222A
Why is the variance defined as
the average squared deviation?
Why not use the sum of squared
deviations as a measure of
variation instead?
After all, the sum of squared
deviations increases in
magnitude when the variation
of a data set increases!!
Population Variance – Example
24
![Page 25: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/25.jpg)
Example:
The following sample consists of the number of jobs six students applied for: 17, 15, 23, 7, 9, 13. Finds its mean and variance
Solution:
2
2222
in
1i2
jobs2.33
)1413...()1415()1417(16
1
1n
)xx(s
jobs146
84
6
1397231517
6
xx
i6
1i
Sample Variance – Example
25
![Page 26: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/26.jpg)
( )
1
2
2
2
n
n
xx
s
The shortcut formula for the sample variance:
Sample Variance – Shortcut
Formula
26
![Page 27: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/27.jpg)
( ) ( )
2
2222
2i
n1i2
i
n
1i
2
jobs2.33
6
13...151713...1517
16
1
n
)x(x
1n
1s
Sample Variance – Shortcut
Formula
27
![Page 28: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/28.jpg)
Sample Variance – Example 2
The hourly wages for a sample of part-time employees at Home Depot are: $12, $20, $16, $18, and $19. What is the sample variance?
28
![Page 29: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/29.jpg)
The standard deviation of a set of
observations is the positive square root of
the variance .
2
2
:deviationandardstPopulation
ss:deviationstandardSample
The unit of measure for the standard deviation
is the same as the unit of measure for the data
Standard Deviation
29
![Page 30: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/30.jpg)
Example: Find the (1) variance and (2) standard
deviation for the data {5, 7, 1, 3, 8}:
2 . 8 ) 8 . 32 ( 4
1 2 s 1) 86 . 2 2 . 8 s 2)
x x x ( ) x x 2
0.2
2.2
-3.8
-1.8
3.2
0.04
4.84
14.44
3.24
10.24
5
7
1
3
8
24 0 32.08 Sum:
Solutions:
x 1 5
5 7 1 3 8 4 8 ( ) . First:
Standard Deviation
30
![Page 31: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/31.jpg)
The number of traffic citations issued during the last
five months in Beaufort County, South Carolina, is 38,
26, 13, 41, and 22. What is the population variance?
EXAMPLE – Variance and
Standard Deviation
31
![Page 32: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/32.jpg)
EXAMPLE – Sample Variance
The hourly
wages for a
sample of part-
time employees
at Home Depot
are: $12, $20,
$16, $18, and
$19. What is
the sample
variance?
32
![Page 33: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/33.jpg)
If the data is given in the form of a
frequency distribution, we need to make
a few changes to the formulas for the
mean, variance, and standard deviation
Complete the extension table in order to
find these summary statistics
Calculation of Mean, Variance and
Standard Deviation
33
![Page 34: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/34.jpg)
Population variance, 2 =
Sample variance, s2 =
Population standard deviation, =
Sample standard deviation, s =
Ν
Ν
ΧΧ
22 )(
1
)( 22
n
n
ΧΧ
2
2s
Variance and Standard Deviation
for Ungrouped Data
34
![Page 35: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/35.jpg)
X
X2
50
58
63
63
65
65
65
67
67
74
74
74
74
74
78
78
80
80
86
89
2 500
3 364
3 969
3 969
4 225
4 225
4 225
4 489
4 489
5 476
5 476
5 476
5 476
5 476
6 084
6 084
6 400
6 400
7 396
7 921
X = 1 424
X2 = 103 120
X2 = 103120; (X)2 = 2 027 776
2 =
= 86.56
= = 9.30
2 027 776103120
20
20
56.86
For Ungrouped Data
35
![Page 36: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/36.jpg)
Example: A survey of students in the first grade at a local school asked for the number of brothers and/or sisters for each child. The results are summarized in the table below. Find (1) the mean, (2) the variance, and (3) the standard deviation:
s2
2239 93
6262 1
163
( )
.2) s 163 128. .3) x 93 62 15/ .1)
Solutions:
First
: 17
23
5
2
17
46
20
10
1
2
4
5
62 93 Sum:
15 0 0
x f xf x f2
17
92
80
50
239
0
For a Frequency Distribution
36
![Page 37: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/37.jpg)
Standard Deviation of Grouped
Data
37
![Page 38: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/38.jpg)
Standard Deviation of Grouped
Data - Example Refer to the frequency distribution for the Whitner
Autoplex data used earlier. Compute the standard deviation of the vehicle selling prices
38
![Page 39: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/39.jpg)
The standard deviation can be used to
compare the variability of several distributions
make a statement about the general shape of a distribution.
The empirical rule: If a sample of observations has a mound-shaped distribution, the interval
tsmeasuremen the of 68%ely approximat contains )sx,sx(
tsmeasuremen the of 95%ely approximat contains )s2x,s2x(
tsmeasuremen the of 99.7%ely approximat contains )s3x,s3x(
Interpreting Standard Deviation
39
![Page 40: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/40.jpg)
Standard deviation is a measure of
variability, or spread
Two rules for describing data rely on the
standard deviation:
Empirical rule: applies to a variable that is
normally distributed
Chebyshev’s theorem: applies to any
distribution
Interpreting Standard Deviation
40
![Page 41: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/41.jpg)
The Empirical Rule
41
![Page 42: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/42.jpg)
Notes:
The empirical rule is more informative than Chebyshev’s theorem since we know more about the distribution (normally distributed)
Also applies to populations
Can be used to determine if a distribution is normally distributed
1. Approximately 68% of the observations lie within 1
standard deviation of the mean
2. Approximately 95% of the observations lie within 2
standard deviations of the mean
3. Approximately 99.7% of the observations lie within 3
standard deviations of the mean
Empirical Rule: If a variable is normally distributed, then:
The Empirical Rule
42
![Page 43: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/43.jpg)
xx s x sx s2 x s2x s3 x s3
68%
95%
99.7%
The Empirical Rule
43
![Page 44: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/44.jpg)
1) What percentage of weights fall between 5.7 and 7.3?
2) What percentage of weights fall above 7.7?
Example: A random sample of plum tomatoes was selected from a local grocery store and their weights recorded. The mean weight was 6.5 ounces with a standard deviation of 0.4 ounces. If the weights are normally distributed:
Solutions:
( , ) ( . (0. ), . (0. )) ( . , . ) x s x s 2 2 6 5 2 4 6 5 2 4 5 7 7 3 Approximat ely 95% of the weigh ts fall be tween 5.7 and 7.3
1)
( , ) ( . (0. ), . (0. )) ( . , . ) x s x s 3 3 6 5 3 4 6 5 3 4 5 3 7 7 Approximat ely 99.7% of the wei ghts fall between 5. 3 and 7.7 Approximat ely 0.3% of the weigh ts fall out side (5.3, 7.7) Approximat ely (0.3 / 2) =0.15% of t he weights fall abov e 7.7
2)
Empirical Rule – Example
44
![Page 45: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/45.jpg)
1. Find the mean and standard deviation for the data
2. Compute the actual proportion of data within 1, 2, and 3 standard deviations from the mean
3. Compare these actual proportions with those given by the empirical rule
4. If the proportions found are reasonably close to those of the empirical rule, then the data is approximately normally distributed
Note: The empirical rule may be used to determine whether or not a set of data is approximately normally distributed
Empirical Rule
45
![Page 46: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/46.jpg)
Example: A statistics practitioner wants
to describe the way returns on
investment are distributed.
The mean return = 10%
The standard deviation of the return = 8%
The histogram is bell shaped.
More Examples
46
![Page 47: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/47.jpg)
Example: – solution
The empirical rule can be applied (bell shaped histogram)
Describing the return distribution Approximately 68% of the returns lie between 2% and
18% [10 – 1(8), 10 + 1(8)]
Approximately 95% of the returns lie between -6% and 26% [10 – 2(8), 10 + 2(8)]
Approximately 99.7% of the returns lie between -14% and 34% [10 – 3(8), 10 + 3(8)]
More Examples
47
![Page 48: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/48.jpg)
Chebyshev’s Theorem
The arithmetic mean biweekly amount contributed by the Dupree Paint employees to the company’s profit-sharing plan is $51.54, and the standard deviation is $7.51. At least what percent of the contributions lie within plus 3.5 standard deviations and minus 3.5 standard deviations of the mean?
48
![Page 49: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/49.jpg)
Chebyshev’s Theorem: The proportion of any distribution that lies within k standard deviations of the mean is at least 1 (1/k2), where k is any positive number larger than 1. This theorem applies to all distributions of data.
at least
1 12
k
xx ks x ks
Illustration:
Chebyshev’s Theorem
49
![Page 50: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/50.jpg)
The proportion of observations in any sample that lie within
k standard deviations of the mean is at least
1-1/k2 for k > 1.
This theorem is valid for any set of measurements (sample,
population) of any shape!!
K Interval Chebyshev Empirical Rule
1 at least 0% approximately 68%
2 at least 75% approximately 95%
3 at least 89% approximately 99.7%
s2x,s2x
sx,sx
s3x,s3x
(1-1/12)
(1-1/22)
(1-1/32)
Chebyshev’s Theorem
50
![Page 51: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/51.jpg)
Chebyshev’s theorem is very conservative and
holds for any distribution of data
Chebyshev’s theorem also applies to any
population
The two most common values used to describe a
distribution of data are k = 2, 3
1.7 2 2.5 3
0.65 0.75 0.84 0.89
k
1 1 2( / )k
The table below lists some values for k and 1 - (1/k2):
Chebyshev’s Theorem
51
![Page 52: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/52.jpg)
Example: At the close of trading, a random sample of 35 technology stocks was selected. The mean selling price was 67.75 and the standard deviation was 12.3. Use Chebyshev’s theorem (with k = 2, 3) to describe the distribution.
)104.65 ,85.30()3.12(375.67 ),3.12(375.67()3 ,3( sxsx
Using k=3: At least 89% of the observations lie within 3
standard deviations of the mean:
Solutions:
)35.92 ,15.43()3.12(275.67 ),3.12(275.67()2 ,2( sxsx
Using k=2: At least 75% of the observations lie within 2 standard
deviations of the mean:
Chebyshev’s Theorem – Example
52
![Page 53: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/53.jpg)
Example: The annual salaries of the employees of a chain of computer
stores produced a positively skewed histogram. The mean and
standard deviation are $28,000 and $3,000, respectively. What
can you say about the salaries at this chain?
Solution:
At least 75% of the salaries lie between $22,000 and $34,000
28000 – 2(3000) 28000 + 2(3000)
At least 88.9% of the salaries lie between $19,000 and $37,000
28000 – 3(3000) 28000 + 3(3000)
Chebyshev’s Theorem – Example
53
![Page 54: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/54.jpg)
Normal Distribution
mean
median
mode
Frequency
X
Normal Distribution
54
![Page 55: MEASURES OF VARIABILITY · Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency](https://reader035.vdocuments.us/reader035/viewer/2022071012/5fca5e30153cc3516d28892f/html5/thumbnails/55.jpg)
Normal Distribution
Frequency
X
Normal Distribution
55