msv 33: measures of spread
TRANSCRIPT
MSV 33: Measures of Spread
www.making-statistics-vital.co.uk
The Bee Academy
‘And our topic today, my fellow bees, is spread!’
‘Mmm...’
Professor Zzub
‘No, no, no, Millie! I mean, How can we measure
how spread out a data set is!’
‘The data sets 1, 3, 5, 7, 9 and 3, 4, 5, 6, 7have the same mean, but the first set
is clearly more spread out than the second.’
‘I’m lost.Example please...’
‘So you are asking how we could measure that – how about the
top number take away the bottom for each set? If the
spread is big, that’ll be big!’
‘Nice idea, Ding – and this measure is used! It’s called the RANGE. So the range for our first set
is 9 - 1 = 8, while the range for our second set is 7 - 3 = 4.’
1, 3, 5, 7, 9 and 3, 4, 5, 6, 7
‘Let me guess – there’s more to it than that.’
‘Sadly, Brenda, the range is badly affected by extreme values or
outliers. It can give a rather misleading picture of the data.’
1, 3, 5, 7, 9, 11, 13 and 3, 4, 5, 6, 7, 8, 20
Range = 12 Range = 17
‘Okay, then, don’t take all the data; chuck away the lowest quarter, and the highest
quarter, and THEN take the range. Just taking the middle 50%, you’ve got rid
of all those extreme values.‘
‘1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
‘Great idea, Paul – so for example with this small data set, we can add the quartiles, Q1, Q2 (the median) and Q3...’
‘... And the Interquartile Range is Q3 – Q1 = 6, the range of the middle 50% of the data.
‘I’ve got another idea!’
‘Go back to this data set again. We could find the mean, then find the difference of each of these numbers from the mean,
and then add the differences together. If the numbers are spread out, then this will be big!’
‘What’s that, Millie?’
‘1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
‘That is nearly a great idea, Millie, but watch what happens...’
‘1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
‘So the differences add to 0. Always.’
‘But that is easily fixed...’
‘Find the POSITIVE difference of each of these numbers from the mean,
and then add these differences together.It won’t be 0 now!’
‘Indeed, Virender, the sum now is 30.
But is that a fair measure of spread?’
‘Surely you have to divide by the total number of numbers you have -
to take an average!’
‘Excellent, Ding! And this takes us to what is called
‘the mean deviation from the mean’. If we write it in symbols, we have
‘There is still a problem, however – The modulus function is not always easy to handle mathematically. It is true that |ab|=|a||b|, but it is not generally true
that |a + b| = |a|+|b|.’
‘Well, there are other ways to make the differences from the mean all positive.
You could square the differences, for example!’
‘Great idea, Millie. So we can find the square of the difference of each of these numbers from the mean,
and then add these together. Then divide by the total number of numbers we have.’
‘This is called the MSD, or ‘the population variance’. If we multiply out, we get an alternative formulation
that is usually easier to calculate, especially if the mean is not a whole number.’
‘As before.’
‘So have we got it now? Is this the measure of spread
we generally use?’
‘We are very nearly there, Brenda. There is, sadly, a problem with the MSD. Most of the time we are taking a SAMPLE from a population. We would like the expectation of our variance statistic to be the variance of the population. But in order for that to happen...
‘We have to take our MSD statistic...
‘And divide by n-1 rather than n.’
‘This statistic is called ‘the ‘sample variance’ or simply the ‘variance’. The expected value of this is the population variance.
As with the population variance statistics, there is an alternative form...
‘Which is often easier to use.’
‘So is that all the measures of
spread we need to know?’
‘I should add, Virender, that we do use the square root of the MSD (called RMSD) and
the square root of the variance (called the Standard Deviation)
as measures of spread too. The advantages of the RMSD and the SD are that they are
measured in the same units as the random variable we are interested in.’
‘So to summarise...’
Range = Top value –
bottom value.
Interquartile range (IQR)= Q3 Q1, where the quartiles Q1, Q2
and Q3 divide the data set into four groups of equal size.
Mean Square Deviation
(population variance).
Root Mean Square Deviation = RMSD.
Variance (or sample variance).
Standard Deviation.
www.making-statistics-vital.co.uk
is written by Jonny Griffiths
With thanks to pixabay.com