2.4 measures of variation
DESCRIPTION
2.4 Measures of Variation. What is variability in data?. Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data. The common measures of variation in data are – range , deviation , variance and standard deviation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/1.jpg)
What is variability in data?
Measuring how much the group as a whole deviates from the center.
Gives you an indication of what is the spread of the data.
The common measures of variation in data are – range, deviation, variance and standard deviation.
2.4 Measures of Variation
![Page 2: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/2.jpg)
Range
The range is the simplest measure of variation. It is difference between the biggest and smallest random variable.
Range = Maximum value - Minimum value
Range has the advantage of being easy to compute.Its disadvantage, however, is that it uses only two entries from the entire data set.Age based on class survey data: 26, 25, 35, 35, 40, 41, 21, 19, 20, 20, 30, 25, 24, 47, 36, 16, 23, 48, 40, 21, 27, 22, 39, 34, 26, 25, 16, 24, 33, 32, 28, 48, 40, 38. Range = maximum – minimum = 48 – 16 = 32
![Page 3: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/3.jpg)
Deviation, Variance and Standard Deviation
The deviation of an entry xi in a data set is the difference between that entry and the mean μ of the data set i.e. xi – μ
The population variance of the population data set of N entries is:
The population standard deviation is the square root of the population variance i.e.
The sample variance of the sample data set of N entries is:
The sample standard deviation is the square root of the sample variance i.e.
![Page 4: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/4.jpg)
Deviation, Variance and Standard Deviation
Age based on class survey: 26, 25, 35, 35, 40, 41, 21, 19, 20, 20, 30, 25, 24, 47, 36, 16, 23, 48, 40, 21, 27, 22, 39, 34, 26, 25, 16, 24, 33, 32, 28, 48, 40, 38. Population size N = 34, Population mean μ = 1024/34 = 30.11765
σ2 = 82.2803σ = 9.0708
Age (xi) xi - μ (xi – μ)2
26 -4.1176 16.9550
25 -5.1176 26.1903
: : :
: : :
38 7.8823 62.1314
Σ=2797.5294
![Page 5: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/5.jpg)
Deviation, Variance and Standard Deviation
Variance and standard deviation take into consideration all the data. However they are both easily influenced by extreme scores since it is a square term.
Variance is hard to interpret since it is a squared measure, standard deviation is interpreted as the average deviation from the mean.
![Page 6: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/6.jpg)
Interpreting Standard Deviation
When interpreting the standard deviation, remember that it is a measure of the typical amount an entry deviates from the mean. The more the entries are spread out, the greater the standard deviation.
![Page 7: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/7.jpg)
Interpreting Standard Deviation
Empirical Rule or The 68-95-99.7 rule: For a bell shaped symmetric distribution 68% of the data lies within one standard deviation of the mean, 95% of the data lies within two standard deviations of the mean and 99.7% of the data lies within 3 standard deviations of the mean.
![Page 8: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/8.jpg)
Interpreting Standard Deviation
Chebychev’s theoremWhen the distribution is not bell shaped or symmetric then this theorem gives a lower bound to the proportion of data the lies with k standard deviations of the mean. It states that:The proportion of any data set lying within k standard deviations
of the mean is at least
• k=2, In any data set, at least i.e. 75% of the data lies within 2 standard deviations of the mean.
![Page 9: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/9.jpg)
Standard Deviation of Grouped Data
Sample standard deviation for a frequency distribution is:
Where c is the number of classes, xi is the ith data point in the sample, fi is the corresponding frequency, n is the sample size.
![Page 10: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/10.jpg)
What are measures of position?
A measure of position gives you some idea of where particular data values would rank in an ordering of a data set
where a data value falls with respect to the mean of the sample or population..
2.5 Measures of Position
![Page 11: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/11.jpg)
Quartiles
Quartiles divide the data into 4 equal parts. We need three quartiles to divide any data set into 4 equal parts, Q1, Q2 and Q3. About a quarter of the data falls below the first
quartile, Q1
About a half of the data falls below the second quartile, Q2
About three quarters of the data falls below the third quartile, Q3
Interquartile range (IQR) of a data set is the difference between the third and first quartiles, Q3 – Q1
![Page 12: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/12.jpg)
Quartiles
In essence five values can use used to describe a data set: Minimum data value, three quartiles - Q1, Q2, Q3 and maximum data value. These five numbers are called the five number summary since they describe the central tendency, the spread and the variation in the data.
Drawing a Box-whisker plot Find the five-number summary of the data set. Construct a horizontal; scale that spans the range of the data. Plot the five number above the horizontal scale. Draw a box above the horizontal scale from Q1 to Q3 and draw
a vertical line in the box at Q2.
Draw whiskers from the box to minimum and maximum entries
For the age data: Min = 16, Q1=23.25, Q2 = 27.5, Q3 = 37.5, Max = 48
Min entry Q1 Q2, Median Q3 Max entry
Whisker Box Whisker
![Page 13: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/13.jpg)
Percentiles and Other Fractiles
Fractiles Summary Symbols
Quartiles Divide a data set into 4 equal parts
Q1, Q2, Q3
Deciles Divide a data set into 10 equal
parts
D1, D2, D3,.. Q9
Percentiles Divide a data set into 100 equal
parts
P1, P2, P3,.. P99
Fractiles are numbers that divide an ordered data set into equal parts.Some commonly used fractiles are:
![Page 14: 2.4 Measures of Variation](https://reader035.vdocuments.us/reader035/viewer/2022062407/56812bc6550346895d900f07/html5/thumbnails/14.jpg)
z-score
The standard score or z-score, represents the number of standard deviations a given value x falls from the mean μ. To find the z-score for a given value,
A z-score can be positive, negative or zero. If z is positive, the data point > the mean,If z is negative, the data point < the mean,If z = 0, the data point = mean.
x
stdev
meanvaluez