s1: chapter 2/3 data: measures of location and dispersion dr j frost ([email protected])...
TRANSCRIPT
S1: Chapter 2/3Data: Measures of Location and Dispersion
Dr J Frost ([email protected])www.drfrostmaths.com
Last modified: 9th September 2015
For Teacher UseLesson 1:
Concept of variables in statistics.Mean of ungrouped and grouped data.Using a calculator to input data and calculate .Combined means.
Lesson 2:Median from discrete data and continuous data (usinglinear interpolation)
Lesson 3:Quartiles/percentiles using linear interpolation.
Lesson 4:Variance/Standard Deviation
Lesson 5:Coding. Consolidation.
Variables in algebra vs stats
Similarities
𝑥 Just like in algebra, variables in stats
represent the value of some quantity, e.g. shoe size, height, colour.
Variables can be discrete or continuous.
Discrete: Limited to specific values, e.g. shoe size, colour, day born. May be quantitative (i.e. numerical) or qualitative (e.g. favourite colour).Continuous: Can be any real number in some range (time, weight, …)
Can be part of further calculations, e.g. if represents height, then represents twice people’s height. In stats this is known as ‘coding’, which we’ll cover later.
Differences
Unlike algebra, a variable in stats represents the value of multiple objects. e.g. The heights of all people in a room.
Because of this, we can do operations on it as if it was a collection of values:
If represents people’s heights, gives the sum of everyone’s heights.In algebra this would be meaningless. If , then makes no sense!
is the mean of . Notice is a collection of values whereas is a single value.
Discrete ?
Continuous ?
Mean of ungrouped data
𝑥=Σ𝑥𝑛
You all know how to find the mean of a list of values. But lets consider the notation, and see how theoretically we could calculate each of the individual components on a calculator.
Length of cat
?
?
Whip out yer Casios…
The overbar in stats specifically means ‘the sample mean of’, but don’t worry about this for now.
Length of cat
Inputting Data
1. Press the ‘MODE’ button and select STAT.2. The “1-VAR” means “one variable”, so select this.
(We’ll also be using A + BX later in the year, which as you might guess, is to do with fitting straight lines of best fit)
3. Enter each value above in the table that appears, and press ‘=‘ after each one.
4. Now press AC to stop entering data so we can start inputting a calculation…
Length of cat
Inputting Data
In STATS mode, we can still do many calculations as normal (e.g. try 1 + 1 = ) but we can also insert various statistical calculations involving . Use SHIFT -> [1] to access these. Try inputting these expressions:
𝑛=𝟓 Σ𝑥=𝟏𝟐 .𝟖𝟓𝑥=𝟐 .𝟓𝟕Σ𝑥 /𝑛=𝟐 .𝟓𝟕
3 𝑥+1=𝟖 .𝟕𝟏
?
?
?
?
?
Grouped Data
Height of bear (in metres) Frequency
Estimate of Mean:
What does the variable represent?
Why is our mean just an estimate?
The midpoints of each interval. They‘re effectively a sensible single value used to represent each interval.
Because we don’t know the exact heights within each group. Grouping data loses information.
? ? ?
?
?
Inputting Frequency TablesHeight of bear (in metres) Frequency
We now need to input frequencies.Go to SETUP (SHIFT -> MODE), press down, then select STAT.Select ‘ON’ for frequency.Your calculator will preserve this setting even when switched off.
Now go into STATS mode and input your table, using your midpoints as the values. How on the calculator do we get:
𝑥=∑ 𝑓𝑥
∑ 𝑓
? This is . This is because your calculator thinks is the original variable rather than just a list of interval midpoints. Just imagine that your calculator is converting your table into a list of values with duplicates.
? This is . (Since )
Bro Exam Tip: In the exam you get a method mark for the division and an accuracy mark for the final answer. Write:
You’re not required to show working like “”
?
?
Mini-Exercise
Num children () Frequency ()
Use your calculator’s STAT mode to determine the mean (or estimate of the mean). Ensure that you show the division in your working.
𝑐=Σ𝑐𝑛
=1110
=1.1
IQ of L6Ms2 () Frequency ()
𝑥=Σ 𝑓𝑥Σ 𝑓
=156016
=97.5
Time Frequency ()
to 3sf.
? ?
?
1 2
3
GCSE RECAP :: Combined Mean
The mean maths score of 20 pupils in class A is 62.The mean maths score of 30 pupils in class B is 75.a) What is the overall mean of all the pupils’ marks. b) The teacher realises they mismarked one student’s paper; he should have
received 100 instead of 95. Explain the effect on the mean and median.
The revised score will increase the mean but leave the median unaffected.
Archie the Archer competes in a competition with 50 rounds. He scored an average of 35 points in the first 10 rounds and an average of 25 in the remaining rounds. What was his average score per round?
Test Your Understanding
𝑴𝒆𝒂𝒏=(𝟑𝟓×𝟏𝟎)+ (𝟐𝟓×𝟒𝟎)
𝟓𝟎=𝟐𝟕
?
?
Median – which item?You need to be able to find the median of both listed data and of grouped data.
Listed data
Items Position of median Median
5 3rd 7
4 2nd/3rd 9.5
7 4th 7
10 5th/6th 7.5
Can you think of a rule to find the position of the median given ?
! To find the position of the median for listed data, find : - If a decimal, round up. - If whole, use halfway between this item and the one after.
IQ of L6Ms2 () Frequency ()
Grouped data
Position to use for median:
8.5
! To find the median of grouped data, find , then use linear interpolation.
DO NOT round or adjust it in any way. This is just like at GCSE where, if you had a cumulative frequency graph with 60 items, you’d look across the 30th. We’ll cover linear interpolation in a sec…
? ?? ?? ?? ?
?
?
Quickfire Questions…What position do we use for the median?
Lengths: 3cm, 5cm, 6cm, …
Median position: 6th
Lengths: 4m, 8m, 12.4m, …
Median position: 12th/13th
Age Freq
Median position: 8.5
Score Freq
Median position: 5
Bro Exam Note: Pretty much every exam question has dealt with median for grouped data, not listed data.
Ages: 5, 7, 7, 8, 9, 10, …
Median position: 30th/31st
Score Freq
Median position: 10.5
Weights: 1.2kg, 3.3kg, …
Median position: 18th
Volume (ml) Freq
Median position: 6.5
Weights: 4.4kg, 7.6kg, 7.7kg…
Median position: 9th/10th
?
?
?
?
?
?
?
?
?
Linear Interpolation
Height of tree (m) Freq C.F.
At GCSE we draw a suitable line on a cumulative frequency graph. How could we read of this value exactly using suitable calculation? We could find the fraction of the way along the line segment using the frequencies, then go this same fraction along the class interval.
Linear Interpolation
Height of tree (m) Freq C.F.
The 75th item is within the class interval because 75 is within the first 100 items but not the first 55.
55 100750.6m Med 0.65m
Frequency up until this interval
Frequency at end of this interval
Item number we’re interested in.
Height at start of interval. Height at end of interval.
? ? ?
??
Linear Interpolation
55 100750.6m Med 0.65m
Frequency up until this interval
Frequency at end of this interval
Item number we’re interested in.
Height at start of interval. Height at end of interval.
Height of tree (m) Freq C.F.
Bro Tip: I like to put the units to avoid getting frequencies confused with values.
𝑴𝒆𝒅𝒊𝒂𝒏=𝟎 .𝟔+(𝟐𝟎𝟒𝟓×𝟎 .𝟎𝟓)
What fraction of the way across the interval are we?
Hence: ?
?
Bro Tip: To quickly get frequency before and after, just look for the two cumulative frequencies that surround the item number.
Weight of cat (kg) Freq C.F.
More Examples
Median class interval:
10 1816
3kg 4kg
Fraction along interval:
Median:
?
? ? ?
? ?
?
?
Time (s) Freq C.F.
7 2010
12s 14s
Median: (to 3sf)
? ? ?
? ?
?
Weight of cat to nearest kg Frequency
What’s different about the intervals here?
10−12
There are GAPS between intervals!What interval does this actually represent?
9 .5−12.5Lower class boundary Upper class boundary
Class width = 3
?
?
Identify the class width
Distance travelled (in m) …
Class width = 10?
Time taken (in seconds) …
Weight in kg … Speed (in mph) …
Lower class boundary = 200?Class width = 3 ?
Lower class boundary = 3.5?
Class width = 2 ?Lower class boundary = 29?
Class width = 10?Lower class boundary = 30.5?
Linear Interpolation with gapsJan 2007 Q4
𝑀𝑒𝑑𝑖𝑎𝑛=𝟏𝟗 .𝟓+(𝟑𝟏𝟒𝟑 ×𝟏𝟎)=𝟐𝟔 .𝟕𝒌𝒈
10297297105111116119120
29 7260
19.5 miles 29.5 miles
? ? ?
? ??
Test Your Understanding
Age of relic (years) Frequency0-1000 241001-1500 291501-1700 121701-2000 35
Questions should be on a printed sheet…
years
Shark length (cm) Frequency175810
𝑀𝑒𝑑𝑖𝑎𝑛=100+( 35×200)? ?
Exercise 2Questions should be on a printed sheet…
𝑴𝒆𝒅𝒊𝒂𝒏=𝟑𝟒 .𝟓+(𝟏𝟔𝟑𝟎×𝟓)=𝟑𝟕 .𝟐
𝑴𝒆𝒅𝒊𝒂𝒏=𝟒𝟎+(𝟐𝟐𝟕𝟒 ×𝟐𝟎)=𝟒𝟓 .𝟗 𝑴𝒆𝒅𝒊𝒂𝒏=𝟔𝟎+(𝟏𝟒𝟑𝟐 ×𝟑𝟎)=𝟕𝟑 .𝟏𝟐𝟓
1
2 3
?
? ?
Exercise 2
4
?
?
Quartiles – which item?You need to be able to find the quartiles of both listed data and of grouped data. The rule is exactly the same as for the median.
Listed data
Items Position of LQ & UQ LQ & UQ
5 2nd & 4th 7
4 1st/2nd & 3rd/4th 6.5 & 12.5
7 2nd & 6th 4 & 9
10 3rd and 8th 3 & 10
Can you think of a rule to find the position of the LQ/UQ given ?
! To find the position of the LQ/UQ for listed data, find or then as before: - If a decimal, round up. - If whole, use halfway between this item and the one after.
IQ of L6Ms2 () Frequency ()
Grouped data
Position to use for LQ:
4.25
! To find the median of grouped data, find and , then use linear interpolation.
Again, DO NOT round this value.
? ?? ?? ?? ?
?
?
PercentilesThe LQ, median and UQ give you 25%, 50% and 75% along the data respectively.But we can have any percentage you like.
Item to use for 57th percentile?
24.51?
You will always find these for grouped data in an exam, so never round this position.
Lower Quartile:𝑄1 Median:𝑄2
Upper Quartile:𝑄3 57th Percentile:𝑃57
? ?
? ?
Notation:
Test Your Understanding
Age of relic (years) Frequency0-1000 241001-1500 291501-1700 121701-2000 35
These are the same as the ‘Test Your Understanding’ questions on your sheet from before.
Item for :25th
years
years
Shark length (cm) Frequency175810
Item to use for : 14.8?
?
?
?
?
?
?
Exercise 3
?
?
Exercise 3
𝑄1=𝟑𝟎+(𝟏𝟓𝟑𝟏×𝟑𝟎)=𝟒𝟒 .𝟓𝑄1=𝟏𝟗 .𝟓+( 𝟏
𝟒𝟑×𝟏𝟎)=𝟒𝟒 .𝟓?
?
?
What is variance?
𝐼𝑄
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝐷𝑒𝑛𝑠𝑖𝑡𝑦
110𝐼𝑄
110
Distribution of IQs in L6Ms4 Distribution of IQs in L6Ms5
Here are the distribution of IQs in two classes. What’s the same, and what’s different?
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝐷𝑒𝑛𝑠𝑖𝑡𝑦
VarianceVariance is how spread out data is.Variance, by definition, is the average squared distance from the mean.
𝑥−𝑥()2𝑛𝜎 2=¿
Distance from mean…
Squared distance from mean…
Σ
Average squared distance from mean…
Simpler formula for variance
“The mean of the squares minus the square of the mean (‘msmsm’)”
Variance
𝜎 2= Σ𝑥2
𝑛−𝑥2? ?
Standard Deviation
𝜎=√𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒The standard deviation can ‘roughly’ be thought of as the average distance from the mean.
Examples
2cm 3cm 3cm 5cm 7cm
Variance cm
Standard Deviationcm
?
?
1, 3
Variance
Standard Deviation
So note that that in the case of two items, the standard deviation is indeed the average distance of the values from the mean.
?
?
PracticeFind the variance and standard deviation of the following sets of data.
24 6Variance = Standard Deviation =
12345Variance = Standard Deviation =
? ?
??
Extending to frequency/grouped frequency tables
We can just mull over our mnemonic again:
Variance: “The mean of the squares minus the square of the means (‘msmsm’)”
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=Σ 𝑓 𝑥2
Σ 𝑓− 𝑥2? ?
Bro Tip: It’s better to try and memorise the mnemonic than the formula itself – you’ll understand what’s going on better, and the mnemonic will be applicable when we come onto random variables in Chapter 8.
Bro Exam Note: In an exam, you will pretty much certainly be asked to find the standard deviation for grouped data, and not listed data.
Example
May 2013 Q4
𝑥=𝟒𝟖𝟑𝟕 .𝟓𝟐𝟎𝟎
=𝟐𝟒 .𝟏𝟖𝟕𝟓
We can use our STATS mode to work out the various summations needed. Just input the table as normal.Note that, as the discussion before, on a calculator actually gives you because it’s already taking the frequencies into account.
?
?
?
You ABSOLUTELY must check this with the you can get on the calculator directly.
Test Your UnderstandingMay 2013 (R) Q3
?
Most common exam errors
Thinking means . It means the sum of each value squared!
When asked to calculate the mean followed by standard deviation, using a rounded version of the mean in calculating the standard deviation, and hence introducing rounding errors.
Forgetting to square root the variance to get the standard deviation.
ALL these mistakes can be easily spotted if you check your value against “” in STATS mode.
Exercises
Page 40 Exercise 3CQ1, 2, 4, 6
Page 44 Exercise 3D Q1, 4, 5
Coding
What do you reckon is the mean height of people in this room?
Now, stand on your chair, as per the instructions below.
INSTRUCTIONAL VIDEO
Is there an easy way to recalculate the mean based on your new heights? And the variance of your heights?
Starter
Suppose now after a bout of ‘stretching you to your limits’, you’re now all 3 times your original height.
What do you think happens to the standard deviation of your heights?
It becomes 3 times larger (i.e. your heights are 3 times as spread out!)?
What do you think happens to the variance of your heights?
It becomes 9 times larger ?
(Can you prove the latter using the formula for variance?)
The point of coding
Cost of diamond ring (£)£1010 £1020 £1030 £1040 £1050
𝑦=𝑥−100010
Standard deviation of ():therefore…
Standard deviation of ():
?
?
We ‘code’ our variable using the following:
New values :
£1 £2 £3 £4 £5?
Finding the new mean/variance
Old mean Old Coding New mean New
364 𝑦=𝑥−20164368 𝑦=2 𝑥7216354 𝑦=3 𝑥−208512403 𝑦=
𝑥22032
1127 𝑦=𝑥+10379
30025 𝑦=𝑥−1005 405
? ?
? ?
? ?
? ?
? ?
? ?
Example Exam Question
f) Coding is
Mean decreases by 5. unaffected.Median decreases by 5.Lower quartile decreases by 5.Interquartile range unaffected.
Suppose we’ve worked all these out already.
?
Exercises
Page 26 Exercise 2E Q3, 4
Page 47 Exercise 3EQ2, 3, 5, 7(Note: answers in book for Q2 are incorrect: therefore
?
?
?
?
Chapters 2-3 Summary
I have a list of 30 heights in the class. What item do I use for:
• ? 8th • ? Between 15th and 16th • ? 23rd
For the following grouped frequency table, calculate:Height of bear (in metres) Frequency
h=(0.25×4 )+ (0.85×20 )+…
40= 46.75
40=1.17𝑚𝑡𝑜3 𝑠𝑓a) The estimate mean:
b) The estimate median: 0.5+( 1620 ×0.7 )=1.06𝑚c) The estimate variance:(you’re given ) 𝜎 2=67.8125
40−( 46.7540 )
2
=0.329 𝑡𝑜3 𝑠𝑓
?
?
?
???
Chapters 2-3 SummaryWhat is the standard deviation of the following lengths: 1cm, 2cm, 3cm
The mean of a variable is 11 and the variance .The variable is coded using . What is:
a) The mean of ?b) The variance of ?
A variable is coded using .For this new variable , the mean is 15 and the standard deviation 8.What is:
c) The mean of the original data?d) The standard deviation of the original data?
?
??
??