2.4 describing distributions numerically
DESCRIPTION
2.4 Describing Distributions Numerically. Numerical and More Graphical Methods to Describe Univariate Data. 2 characteristics of a data set to measure. center measures where the “middle” of the data is located variability measures how “spread out” the data is. - PowerPoint PPT PresentationTRANSCRIPT
2.4 Numerical Summaries of Data
Numerical and More Graphical Methods to Describe Univariate
Data
2 characteristics of a data set to measure
center
measures where the “middle” of the data is located
variability
measures how “spread out” the data is
The median: a measure of center
Given a set of n measurements arranged in order of magnitude,
Median= middle value n odd
mean of 2 middle values, n even
Ex. 2, 4, 6, 8, 10; n=5; median=6 Ex. 2, 4, 6, 8; n=4; median=(4+6)/2=5
Student Pulse Rates (n=62)
38, 59, 60, 60, 62, 62, 63, 63, 64, 64, 65, 67, 68, 70, 70, 70, 70, 70, 70, 70, 71, 71, 72, 72, 73, 74, 74, 75, 75, 75, 75, 76, 77, 77, 77, 77, 78, 78, 79, 79, 80, 80, 80, 84, 84, 85, 85, 87, 90, 90, 91, 92, 93, 94, 94, 95, 96, 96, 96, 98, 98, 103
Median = (75+76)/2 = 75.5
Medians are used often Year 2017 baseball salaries
Median $1,562,500 (max=$33,000,000 Clayton Kershaw; min=$535,000)
Median age of TV sports viewers: PGA 64, NASCAR 58, MLB 57, WTA 55, NFL 50; NHL 49, NBA 42, MLS 40
Median existing home sales price: June 2017 $263,800; June 2016 $243,200
US Median household income (2015 dollars) 2015 $56,516; 2014 $53,029
NC Median household income (2015 dollars) 2015 $50,797; 2014 $46,838
Median Salaries by Major
The median splits the histogram into 2 halves of equal area
The median splits the histogram into 2 halves of equal area
Median $25,966
NC $24,358
Examples Example: n = 7
17.5 2.8 3.2 13.9 14.1 25.3 45.8 Example n = 7 (ordered): 2.8 3.2 13.9 14.1 17.5 25.3 45.8 Example: n = 8
17.5 2.8 3.2 13.9 14.1 25.3 35.7 45.8
Example n =8 (ordered)
2.8 3.2 13.9 14.1 17.5 25.3 35.7 45.8
m = 14.1
m = (14.1+17.5)/2 = 15.8
10
Think about the median
Six people in a room have a median age of 45 years.
One person who is 40 years old leaves the room.
Question:
What is the median age of the 5 people remaining in the room?
Below are the annual tuition charges at 7 public universities. What is the median
tuition?
4429496049604971524555467586
1. 5245
2. 4965.5
3. 4960
4. 4971
Below are the annual tuition charges at 7 public universities. What is the median
tuition?
4429496052455546497155877586
1. 5245
2. 4965.5
3. 5546
4. 4971
Measures of Spread
The range and interquartile range
Ways to measure variability
range=largest-smallest OK sometimes; in general, too crude;
sensitive to one large or small data value
The range measures spread by examining the ends of the data
A better way to measure spread is to examine the middle portion of the data
m = median = 3.4
Q1= first quartile = 2.3
Q3= third quartile = 4.2
1 1 0.62 2 1.23 3 1.64 4 1.95 5 1.56 6 2.17 7 2.38 6 2.39 5 2.510 4 2.811 3 2.912 2 3.313 1 3.414 2 3.615 3 3.716 4 3.817 5 3.918 6 4.119 7 4.220 6 4.521 5 4.722 4 4.923 3 5.324 2 5.625 1 6.1
Quartiles: Measuring spread by examining the middleThe first quartile, Q1, is the value in the
sample that has 25% of the data at or
below it (Q1 is the median of the lower
half of the sorted data).
The third quartile, Q3, is the value in the
sample that has 75% of the data at or
below it (Q3 is the median of the upper
half of the sorted data).
Quartiles and median divide data into 4 pieces
Q1 M Q3
1/4 1/4 1/4 1/4
Quartiles are Common Measures of Spread
Mid-career earnings by major: 25th, 50th, 75th percentiles.
Quartiles are common measures of spread
https://oirp.ncsu.edu/students/admissions/freshman-profile/
University of Southern California
Rules for Calculating QuartilesStep 1: find the median of all the data (the median divides the data in half)
Step 2a: find the median of the lower half; this median is Q1;Step 2b: find the median of the upper half; this median is Q3.
Important:when n is odd include the overall median in both halves;when n is even do not include the overall median in either half.
Example 2 4 6 8 10 12 14 16 18 20 n = 10
Median m = (10+12)/2 = 22/2 = 11
Q1 : median of lower half 2 4 6 8 10
Q1 = 6
Q3 : median of upper half 12 14 16 18 20
Q3 = 16
11
Pulse Rates n = 138
# Stem Leaves4*
3 4. 5889 5* 00123344410 5. 555678889923 6* 0001111112223333334444423 6. 5555666666777778888888816 7* 0000011222233444423 7. 5555566666677788888899910 8* 000011222410 8. 55556677894 9* 00122 9. 584 10* 0223
10.1 11* 1
Median: mean of pulses in locations 69 & 70: median= (70+70)/2=70
Q1: median of lower half (lower half = 69 smallest pulses); Q1 = pulse in ordered position 35;Q1 = 63
Q3 median of upper half (upper half = 69 largest pulses); Q3= pulse in position 35 from the high end; Q3=78
Below are the weights of 31 linemen on the NCSU football team. What is the
value of the first quartile Q1?
# stemleaf
2 2255
4 2357
6 2426
7 257
10 26257
12 2759
(4) 281567
15 2935599
10 30333
7 3145
5 32155
2 336
1 340
1. 287
2. 257.5
3. 263.5
4. 262.5
Interquartile range
lower quartile Q1
middle quartile: median upper quartile Q3
interquartile range (IQR)
IQR = Q3 – Q1
measures spread of middle 50% of the data
Example: beginning pulse rates
Q3 = 78; Q1 = 63
IQR = 78 – 63 = 15
Below are the weights of 31 linemen on the NCSU football team. The first quartile Q1 is 263.5. What is the value of the IQR?
# stemleaf
2 2255
4 2357
6 2426
7 257
10 26257
12 2759
(4) 281567
15 2935599
10 30333
7 3145
5 32155
2 336
1 340
1. 23.5
2. 39.5
3. 46
4. 69.5
5-number summary of data
Minimum Q1 median Q3 maximum
Pulse data
45 63 70 78 111
m = median = 3.4
Q3= third quartile = 4.2
Q1= first quartile = 2.3
25 1 6.124 2 5.623 3 5.322 4 4.921 5 4.720 6 4.519 7 4.218 6 4.117 5 3.916 4 3.815 3 3.714 2 3.613 1 3.412 2 3.311 3 2.910 4 2.89 5 2.58 6 2.37 7 2.36 6 2.15 5 1.54 4 1.93 3 1.62 2 1.21 1 0.6
Largest = max = 6.1
Smallest = min = 0.6
Disease X
0
1
2
3
4
5
6
7
Yea
rs u
nti
l dea
th
Five-number summary:
min Q1 m Q3 max
Boxplot: display of 5-number summary
BOXPLOT
Boxplot: display of 5-number summary
Example: age of 66 “crush” victims at rock concerts in a recent year.
5-number summary:13 17 19 22 47
Rock concert deaths: histogram and boxplot
Boxplot construction1) construct box with ends located at Q1
and Q3; in the box mark the location of median (usually with a line or a “+”)
2) fences are determined by moving a distance 1.5(IQR) from each end of the box;2a) upper fence is 1.5*IQR above the upper quartile
2b) lower fence is 1.5*IQR below the lower quartile
Note: the fences only help with constructing the boxplot; they do not appear in the final boxplot display
Box plot construction (cont.)3) whiskers: draw lines from the ends of
the box left and right to the most extreme data values found within the fences;
4) outliers: special symbols represent each data value beyond the fences;
4a) sometimes a different symbol is used for “far outliers” that are more than 3 IQRs from the quartiles
Q3= third quartile = 4.2
Q1= first quartile = 2.3
25 1 7.924 2 6.123 3 5.322 4 4.921 5 4.720 6 4.519 7 4.218 6 4.117 5 3.916 4 3.815 3 3.714 2 3.613 1 3.412 2 3.311 3 2.910 4 2.89 5 2.58 6 2.37 7 2.36 6 2.15 5 1.54 4 1.93 3 1.62 2 1.21 1 0.6
Largest = max = 7.9
Boxplot: display of 5-number summary
BOXPLOT
Disease X
0
1
2
3
4
5
6
7
Yea
rs u
nti
l dea
th
8
Interquartile range
Q3 – Q1=4.2 − 2.3 =
1.9
Q3+1.5*IQR=4.2+2.85 = 7.05
1.5 * IQR = 1.5*1.9=2.85. Individual #25 has a value of
7.9 years, so 7.9 is an outlier. The line from the top
end of the box is drawn to the biggest number in the
data that is less than 7.05
Beg. of class pulses (n=138) Q1 = 63, Q3 = 78 IQR=78 63=15
1.5(IQR)=1.5(15)=22.5
Q1 - 1.5(IQR): 63 – 22.5=40.5
Q3 + 1.5(IQR): 78 + 22.5=100.5
7063 7840.5 100.545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards. What is the approximate value of Q3 ?
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1. 450
2. 750
3. 215
4. 545
Careful! Boxplots Do NOT Show Gaps in the Data
Do not rely solely on a boxplot for data exploration
Boxplots are all the same, histograms differ.
Automating Boxplot Construction
Excel “out of the box” does not draw boxplots.
Many add-ins are available on the internet that give Excel the capability to draw box plots.
SAS, JMP, Minitab, R, etc. all make boxplots (learning curve)
Statcrunch (http://statcrunch.stat.ncsu.edu) makes box plots (no learning curve).
ATM Withdrawals by Day, Month, Holidays
Tuition 4-yr Colleges