measures of position where does a certain data value fit in relative to the other data values?
DESCRIPTION
Measures of Position Where does a certain data value fit in relative to the other data values?. To accompany Hawkes lesson 3.3 Original content by D.R.S. N th Place. The highest and the lowest 2 nd highest, 3 rd highest, etc. “If I made $60,000, I would be 6 th richest.”. - PowerPoint PPT PresentationTRANSCRIPT
1
Measures of PositionWhere does a certain data value fit in relative
to the other data values?
To accompany Hawkes lesson 3.3Original content by D.R.S.
2
Nth Place
• The highest and the lowest• 2nd highest, 3rd highest, etc.• “If I made $60,000, I would be 6th richest.”
3
Another view: “How does my compare to the mean?”
• “Am I in the middle of the pack?”• “Am I above or below the middle?”• “Am I extremely high or extremely low?”
• Score is the measuring stick
4
Score: is how many standard deviations away from the mean?
If you know the x value• Population:
• Sample
To work backward from z to x• Population
• Sample
5
score is also called “Standard Score”
• No matter what is measured in or how large or small the values are….
• The score of the mean will be 0– Because numerator turns out to be 0.
• If is above the mean, its is positive.– Because numerator turns out to be positive
• If is below the mean, its is negative.– Because numerator turns out to be negative
6
score values
• Typically round to two decimal places.– Don’t say “0.2589”, say “0.26”
• If not two decimal places, pad– Don’t say “2”, say “2.00”– Don’t say “-1.1”, say “-1.10”
• scores are almost always in the interval . Be very suspicious if you calculate a score that’s not a small number.
7
Practice: Given x, compute z
Find the scores corresponding to the salary values, given that the mean, and the standard deviation .
8
Practice: Given z, compute x
Find the scores (salaries) corresponding to these standard scores, given that the mean, and the standard deviation .
• and • and • and
9
Two parallel axes (scales), and
10
Example: Using scores to compare unlike items
The Literature test• The mean score was 77
points.• The standard deviation was
11 points• Sue earned 91 points• Find her z score for this test
The Biology test• The mean score was 47
points• The standard deviation was
6 points• Sue earned 55 points• Find her z score for this test• On which test did she have
the “better” performance?
11
scores caution with negatives
• Example: compare test scores on two different tests to ascertain “Which score was the more outstanding of the two?”
• Be careful if the scores turn out to be negative. Which is the better performance? or ?
• Stop and think back to your basic number line and the meaning of “<“ and “>”
12
Percentiles
• “What percent of the values are lower than my value?”– 90th percentile is pretty high– 50th percentile is right in the middle– 10th percentile is pretty low
• If you scored in the 99th percentile on your SAT, I hope you got a scholarship.
13
Salary data for our percentile examples
• With these salary values again
• What’s thepercentile for a salary of $59,000 ?
• You can see it’s going to be higher than 50th Because it’s in the top half.
14
Example: Given x, find the percentile
• Count = how many values below $59,000• Count = how many values in the data set• Formula for percentile • Here we have values lower than our $59,000• Here we have values in the data set.• so , “75th percentile”
15
Continued: Given x, find the percentile
• so • Do not say “75%”, but say “the 75th percentile”• Other sources use different formulas, beware!– Some other books use in the numerator.– Excel has two different answers, PERCENTILE.EXC
and PERCENTILE.INC functions.
16
Given Percentile , find the value
• Formula: position from bottom – Again, how many data values in the set– and the percentile rank that’s given.
• Is there a decimal remainder in position ?– If so, then BUMP UP to the next highest whole #
and take the value in that position.– Or if is an exact whole number, take the average
from positions and .• Note: Book uses lowercase instead of .
17
Given Percentile , find the value
• Example: What is the 31st percentile in the salary data?
• 31st percentile: plug in • Compute . It has a remainder.• Bump it up! 7. – Not rounding, but rather bumpety-upping
• So we look 7 positions from the bottom• “The 31st percentile is $44,476”
18
Given Percentile , find the value
• Example: What is the 40th percentile in the salary data? Plug in
• Compute . Exact integer!• So count 8th and 9th from bottom.• “The 40th percentile is $47,367.50, or
$47,368.”
19
Excel gives different answers
• Excel does some fancy interpolation
20
Quartiles Q1, Q2, Q3
• Data values are arranged from low to high.• The Quartiles divide the data into four groups.• Q2 is just another name for the Median.
• Q1 = Find the Median of Lowest to Q2 values
• Q3 = Find the Median of Q2 to Highest values
• It gets tricky, depending on how many values.
21
Quartiles example
• 10, 20, 30, 40, 50, 60, 70, 80, 90• The Second Quartile, Q2 = median = 50• Find the medians of the subsets left and right.• Keep the 50 in each of those subsets.• The First Quartile, Q1
= median of { 10, 20, 30, 40, 50 } = 30• The Third Quartile, Q3
= median of { 50, 60, 70, 80, 90 } = 70
22
Quartiles example
• 10, 20, 30, 40, 50, 60, 70, 80, 90, 100• Q2 = median =. (two middle #s)• Leave the 50 and 60 in place; do not reuse 55• Q1 = median of {10, 20, 30, 40, 50} = 30
• Q3 = median of {60, 70, 80, 90, 100} = 80
23
Quartiles example
• 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110• Q2 = median = (two middle #s). • 55 isn’t really there so you can’t remove it!• Leave the 50 and 60 in place• Q1 = median of {0, 10, 20, 30, 40, 50} = 25
• Q3 = median of {60, 70, 80, 90, 100, 110} = 85• Two middle numbers happened again!
24
Interquartile Range
• Definition: IQR = Q3 – Q1
• In the previous example, 85 – 25 = 60.• Interquartile Range measures how spread out
the middle of the data are– The lowest quartile (x < Q1) is not involved
– And the highest quartile (x > Q3) is not involved.
25
Quartiles with TI-84
• 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110• Put values into a TI-84 List• Use STAT, CALC,
1-Var Stats• Scroll down down
down to get to them.
26
There is disagreement about Quartiles
• The TI-84 sometimes gives different answers than the method we use in the Hawkes materials
• Excel might give different answers from Hawkes and TI-84, both.
• Use the Hawkes method in this course’s work• Be aware of the others– You should know how to use TI-84 and Excel– You should be aware that differences can occur.
27
Quartiles with TI-84 vs. Hawkes
• 10, 20, 30, 40, 50, 60, 70, 80, 90• We got Q1=30 and Q3=70 before.• Hawkes keeps the 50,
using 10,20,30,40,50to compute Q1.
• But the TI-84 throwsout 50 and uses 10,20,30,40.
• Hawkes says the TI-84 is computing “hinges”.
28
Quartiles in Excel
• =QUARTILE.INC(cells, 1 or 2 or 3) seems to give the same results as the old QUARTILE function
• There’s new =QUARTILE.EXC(cells, 1 or 2 or 3)
• Excel does fancy interpolation stuff and may give different Q1 and Q3 answers compared to the TI-84 and our by-hand methods.
29
The Five Number Summary
• Again: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110
• Q2 = median =, Q1 = 25 and Q3 = 85• “The Five Number Summary” is defined as:
the minimum, then Q1, Q2, Q3, then the maximum
• For this set of numbers, the Five Number Summary is “0, 25, 55, 85, 110”
30
The Five Number Summary
• Again: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110
• Q2=55, Q1=25, Q3 = 85• Min is 0, Max is 110• For this set of numbers,
the Five Number Summary is “0, 25, 55, 85, 110”
• Box Plot
• TI-84 can do Box Plot too, but again its quartiles disagree with the way Hawkes defines quartiles.
Min Q1 Q2 Q3 Max0 25 55 85 110
31
Why Box Plot?
• Don’t lose sight of the big picture here:– We have a data set– It’s a bunch of numbers– We want to summarize the data
• Summarize means make it into a sound bite– We must be Concise – don’t say too much– We must be Informative – don’t say too little
32
We must be Concise
• Bad: “Here is a report that tells you the mean and the variance and the standard deviation and the quartiles and the percentiles from 0 to 100… and the marketing survey analyzed by demographic subgroups …” (there is a place for that, but not right now)
• Good: “Got fifteen seconds? Here’s what we found.”
33
Notice the pieces of the boxplot:
• Horizontal scale, maybe a little beyond the min and the max. A generic number line.
• The five numbers.• The box holds the quartiles– With a line in the middle at the median.
• The whiskers extend out to the min and the max.
34
TI-84 Boxplot
• See instructions on separate handout.• Caution again that TI-84 computes quartiles
differently from Hawkes and differently from Excel, so the results aren’t always going to agree.
35
Additional Topics
• Might not be needed for Hawkes homework• But you should be aware of them
• Quintiles and Deciles• Interquartile Range and Outliers• TI-84 Box Plot
36
Quintiles and Deciles
• You might also encounter– Quintiles, dividing data set into 5 groups.– Deciles, dividing data set into 10 groups.
• Reconcile everything back with percentiles:– Quartiles correspond to percentiles 25, 50, 75– Deciles correspond to percentiles 10, 20, …, 90– Quintiles correspond to percentiles 20, 40, 60, 80
37
Interquartile Range and Outliers
• Concept: An OUTLIER is a wacky far-out abnormally small or large data value compared to the rest of the data set.
• We’d like something more precise.• Define: IQR = Interquartile Range = Q3 – Q1.• Define: If , is an Outlier.• Define: If , is an Outlier.• (Other books might make different definitions)
38
Outliers Example
• Here’s an quick elementary example:• Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20• Mean and • Or in Hawkes method, , , and we still get
interquartile range = (it won’t always work out the same but in this case the IQR is the same either way)
39
Outliers Example
• Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20• We found IQR = 6 and the mean is 6.8• One definition uses to define outliers• Here, • Anything more than 9 units away from is then
considered to be abnormally small or large.• , nothing smaller than • : the 20 is an outlier.
40
No-Outliers Example
• Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10• Mean and
(coincidence that , insignificant)
• Anything more than 9 units away from is abnormal.
• This data set has No Outliers.
41
Outliers: Good or Bad?
• “I have an outlier in my data set. Should I be concerned?”– Could be bad data. A bad measurement. Somebody
not being honest with the pollster.– Could be legitimately remarkable data, genuine true
data that’s extraordinarily high or low.• “What should I do about it?”– The presence of an outlier is shouting for attention.
Evaluate it and make an executive decision.