frequency distributions and summary...
TRANSCRIPT
![Page 1: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/1.jpg)
Frequency Distribution and
Summary Statistics
Dongmei Li
Department of Public Health Sciences Office of Public Health Studies University of Hawai’i at Mānoa
![Page 2: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/2.jpg)
Outline
1. Stemplot
2. Frequency table
3. Summary statistics
2
![Page 3: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/3.jpg)
1. Stem-and-leaf plots (stemplots)
Always start by looking at the data with
graphs and plots
Our favorite technique for looking at a
single variable is the stemplot
A stemplot is a graphical technique that
organizes data into a histogram-like
display
You can observe a lot by looking – Yogi Berra
3
![Page 4: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/4.jpg)
Stemplot Illustrative Example
Select an SRS of 10 ages
List data as an ordered array
05 11 21 24 27 28 30 42 50 52
Divide each data point into a stem-value
and leaf-value
In this example the “tens place” will be
the stem-value and the “ones place” will
be the leaf value, e.g., 21 has a stem value
of 2 and leaf value of 1
4
![Page 5: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/5.jpg)
Stemplot illustration (cont.)
Draw an axis for the stem-values:
0| 1| 2| 3| 4| 5| ×10 axis multiplier (important!)
Place leaves next to their stem value
21 plotted (animation)
1
5
![Page 6: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/6.jpg)
Stemplot illustration continued …
Plot all data points and rearrange in rank order:
0|5
1|1
2|1478
3|0
4|2
5|02
×10
Here is the plot horizontally: (for demonstration purposes)
8
7
4 2
5 1 1 0 2 0
------------
0 1 2 3 4 5
------------
Rotated stemplot
6
![Page 7: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/7.jpg)
Interpreting Stemplots Shape
◦ Symmetry
◦ Modality (number of peaks)
◦ Kurtosis (width of tails)
◦ Departures (outliers)
Location
◦ Gravitational center mean
◦ Middle value median
Spread
◦ Range and inter-quartile range
◦ Standard deviation and variance
7
![Page 8: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/8.jpg)
Shape “Shape” refers to the pattern when
plotted
Here’s the silhouette of our data X
X
X X
X X X X X X
-----------
0 1 2 3 4 5
-----------
Consider: symmetry, modality, kurtosis 8
![Page 9: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/9.jpg)
Shape: Idealized Density Curve A large dataset is introduced
An density curve is superimposed to better discuss shape
9
![Page 10: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/10.jpg)
Symmetrical Shapes
10
![Page 11: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/11.jpg)
Asymmetrical shapes
11
![Page 12: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/12.jpg)
Modality (no. of peaks)
12
![Page 13: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/13.jpg)
Kurtosis (width of tails)
Mesokurtic (medium) Platykurtic (flat)
Leptokurtic (steep)
skinny tails
fat tails
Kurtosis is not be easily judged by eye
13
![Page 14: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/14.jpg)
Stemplot – Second Example
Data: 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42
Stem = ones-place
Leaves = tenths-place
Round to keep one digit
after decimal point
(e.g., 1.47 1.5)
Do not plot decimal
|1|5
|2|14
|3|4789
|4|4
(×1)
Shape: asymmetric, skewed to the left, unimodal, no outliers
14
![Page 15: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/15.jpg)
Draw a stemplot using JMP
Analyze---Distribution---Data---Stem and
Leaf
15
Open the JMP data set
named
Stem_and_leaf_plot.jmp
![Page 16: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/16.jpg)
Third Illustrative Example (n = 26)
Age data set from 26 subjects
{14, 17, 18, 19, 22, 22, 23, 24, 24, 26, 26, 27, 28,
29, 30, 30, 30, 31, 32, 33, 34, 34, 35, 36, 37, 38}
16
Data set:
Stem_and_leaf_plot_example2.jmp
Distribution of the age variable?
![Page 17: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/17.jpg)
2. Frequency Table
Frequency = count
Relative frequency = proportion or %
Cumulative frequency % less than or equal to level
AGE | Freq Rel.Freq Cum.Freq.
------+-----------------------
3 | 2 0.3% 0.3%
4 | 9 1.4% 1.7%
5 | 28 4.3% 6.0%
6 | 37 5.7% 11.6%
7 | 54 8.3% 19.9%
8 | 85 13.0% 32.9%
9 | 94 14.4% 47.2%
10 | 81 12.4% 59.6%
11 | 90 13.8% 73.4%
12 | 57 8.7% 82.1%
13 | 43 6.6% 88.7%
14 | 25 3.8% 92.5%
15 | 19 2.9% 95.4%
16 | 13 2.0% 97.4%
17 | 8 1.2% 98.6%
18 | 6 0.9% 99.5%
19 | 3 0.5% 100.0%
------+-----------------------
Total | 654 100.0%
17
![Page 18: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/18.jpg)
Frequency Table with Class Intervals
When data are sparse, group data into class intervals
Create 4 to 12 class intervals
Classes can be uniform or non-uniform
End-point convention: e.g., first class interval of
0 to 10 will include 0 but exclude 10 (0 to
9.99)
Talley frequencies
Calculate relative frequency
Calculate cumulative frequency
18
![Page 19: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/19.jpg)
Class Intervals
Class Freq Relative
Freq. (%)
Cumulative
Freq (%)
0 – 9 1 10 10
10 – 19 1
20 – 29 4
30 – 39 1
40 – 49 1 10 80
50 – 59 2 20 100
Total 10 100 --
Uniform class intervals table (width 10) for data:
05 11 21 24 27 28 30 42 50 52
19
![Page 20: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/20.jpg)
Histogram
0
1
2
3
4
5
0-9
10_1
9
20-2
9
30-3
9
40-4
9
50-5
9
Age Class
A histogram is a frequency chart for a quantitative measurement. Notice how the bars touch.
20
![Page 21: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/21.jpg)
Bar Chart
0
50
100
150
200
250
300
350
400
450
500
Pre- Elem. Middle High
School-level
A bar chart with non-touching bars is reserved for categorical measurements and non-uniform
class intervals
21
![Page 22: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/22.jpg)
3. Summary Statistics
Central location ◦Mean ◦Median ◦Mode
Spread ◦ Range and interquartile range (IQR) ◦ Variance and standard deviation
22
![Page 23: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/23.jpg)
Location: Mean
“Eye-ball method” visualize where plot would balance
Arithmetic method = sum values and divide by n
8
7
4 2
5 1 1 0 2 0
------------
0 1 2 3 4 5
------------
^
Grav.Center
Eye-ball method around 25 to 30 (takes practice)
Arithmetic method mean = 290 / 10 = 29
23
![Page 24: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/24.jpg)
Notation n sample size
X the variable (e.g., ages of subjects)
xi the value of individual i for variable X
sum all values (capital sigma)
Illustrative data (ages of participants):
21 42 5 11 30 50 28 27 24
52
n = 10
X = AGE variable
x1= 21, x2= 42, …, x10= 52
xi = x1 + x2 + … + x10= 21 + 42 + … + 52 = 290
24
![Page 25: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/25.jpg)
Central Location: Sample Mean
“Arithmetic average”
Traditional measure of central location
Sum the values and divide by n
“xbar” refers to the sample mean
n
i
in xn
xxxn
x
1
1121
25
![Page 26: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/26.jpg)
Example: Sample Mean
Ten individuals selected at random have the following ages:
21 42 5 11 30 50 28 27 24 52
Note that n = 10, xi = 21 + 42 + … + 52 = 290, and
0.29)290(10
11 i
xn
x
0 10 20 30 40 50 60
Mean = 29
The sample mean is the gravitational center of a distribution
26
![Page 27: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/27.jpg)
Uses of the Sample Mean
The sample mean can be used to predict:
The value of an observation drawn at
random from the sample
The value of an observation drawn at
random from the population
The population mean
27
![Page 28: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/28.jpg)
Population Mean
Same operation as sample mean except based on entire population (N ≡ population size)
Conceptually important
Usually not available in practice
Sometimes referred to as the expected value
i
ix
NN
x 1
28
![Page 29: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/29.jpg)
Central Location: Median Ordered array:
05 11 21 24 27 28 30 42 50 52
When n is even, the median is the average of the
(n ÷2)th data and the (n ÷2+1)th data.
When n is odd, the median is the ((n+1) ÷2)th
data.
For illustrative data: n = 10 → the median falls
between 27 and 28=(27+28) ÷ 2 =27.5
05 11 21 24 27 28 30 42 50 52
median
Average the adjacent values: M = 27.5 29
![Page 30: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/30.jpg)
More Examples of Medians
Example A: 2 4 6
Median = 4
Example B: 2 4 6 8
Median = 5 (average of 4 and 6)
Example C: 6 2 4
Median 2
(Values must be ordered first)
30
![Page 31: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/31.jpg)
The Median is Robust
This data set has a mean of 1636:
1362 1439 1460 1614 1666 1792 1867
The median is 1614 in both instances, demonstrating its
robustness in the face of outliers.
The median is more resistant to skews and
outliers than the mean; it is more robust.
Here’s the same data set with a data entry error “outlier”
(highlighted). This data set has a mean of 2743:
1362 1439 1460 1614 1666 1792 9867
31
![Page 32: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/32.jpg)
Mode
The mode is the most commonly encountered value in the dataset
This data set has a mode of 7 {4, 7, 7, 7, 8, 8, 9}
This data set has no mode {4, 6, 7, 8} (each point appears only once)
The mode is useful only in large data sets with repeating values
32
![Page 33: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/33.jpg)
Comparison of Mean, Median, Mode
Note how the mean gets pulled toward
the longer tail more than the median
mean = median → symmetrical distrib
mean > median → positive skew
mean < median → negative skew
33
![Page 34: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/34.jpg)
Spread: Quartiles
Two distributions can be quite
different yet can have the same
mean
This data compares particulate
matter in air samples (μg/m3) at
two sites. Both sites have a
mean of 36, but Site 1 exhibits
much greater variability. We
would miss the high pollution
days if we relied solely on the
mean.
Site 1| |Site 2
---------------
42|2|
8|2|
2|3|234
86|3|6689
2|4|0
|4|
|5|
|5|
|6|
8|6|
×10
34
![Page 35: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/35.jpg)
Spread: Range
Range = maximum – minimum
Illustrative example:
Site 1 range = 68 – 22 = 46
Site 2 range = 40 – 32 = 8
Beware: the sample range will
tend to underestimate the
population range.
Always supplement the range
with at least one addition
measure of spread
Site 1| |Site 2
----------------
42|2|
8|2|
2|3|234
86|3|6689
2|4|0
|4|
|5|
|5|
|6|
8|6|
×10
35
![Page 36: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/36.jpg)
Spread: Quartiles
Quartile 1 (Q1): cuts off bottom quarter of data = median of the lower half of the data set
Quartile 3 (Q3): cuts off top quarter of data = median of the upper half of the data set
Interquartile Range (IQR) = Q3 – Q1 covers the middle 50% of the distribution
05 11 21 24 27 28 30 42 50 52
Q1 median Q3
Q1 = 21, Q3 = 42, and IQR = 42 – 21 = 21 36
![Page 37: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/37.jpg)
Quartiles (Tukey’s Hinges) – Example 2 Data are metabolic rates (cal/day), n = 7
When n is odd, include the median in both halves of the data set.
Bottom half: 1362 1439 1460 1614 which has a median of 1449.5 (Q1)
Top half: 1614 1666 1792 1867 which has a median of 1729 (Q3)
1362 1439 1460 1614 1666 1792 1867
median
37
![Page 38: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/38.jpg)
Five-Point Summary
Q0 (the minimum)
Q1 (25th percentile)
Q2 (median)
Q3 (75th percentile)
Q4 (the maximum)
38
![Page 39: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/39.jpg)
Boxplots 1. Calculate 5-point summary. Draw box from Q1
to Q3 w/ line at median
2. Calculate IQR and fences as follows:
FenceLower = Q1 – 1.5(IQR)
FenceUpper = Q3 + 1.5(IQR)
Do not draw fences
3. Determine if any values lie outside the fences
(outside values). If so, plot these separately.
4. Determine values inside the fences (inside values)
Draw whisker from Q3 to upper inside value.
Draw whisker from Q1 to lower inside value
39
![Page 40: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/40.jpg)
Illustrative Example: Boxplot
1. 5 pt summary: {5, 21, 27.5, 42, 52};
box from 21 to 42 with line @
27.5
2. IQR = 42 – 21 = 21. FU = Q3 + 1.5(IQR) = 42 + (1.5)(21) =
73.5
FL = Q1 – 1.5(IQR) = 21 – (1.5)(21) = –
10.5
3. None values above upper fence
None values below lower fence
4. Upper inside value = 52
Lower inside value = 5
Draws whiskers
Data: 05 11 21 24 27 28 30 42 50 52
60
50
40
30
20
10
0
Upper inside = 52
Q3 = 42
Q1 = 21
Lower inside = 5
Q2 = 27.5
40
![Page 41: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/41.jpg)
Illustrative Example: Boxplot 2 Data: 3 21 22 24 25 26 28 29 31 51
60
50
40
30
20
10
0
O utside value (51)
O utside value (3 )
Inside value (21)
Upper hinge (29)
Lower h inge (22)
Median (25 .5 )
Inside value (31)
1. 5-point summary: 3, 22, 25.5, 29, 51: draw box
2. IQR = 29 – 22 = 7 FU = Q3 + 1.5(IQR) = 29 + (1.5)(7) = 39.5 FL = Q1 – 1.5(IQR) = 22 – (1.5)(7) = 11.6
3. One above top fence (51) One below bottom fence (3)
4. Upper inside value is 31 Lower inside value is 21 Draw whiskers
41
![Page 42: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/42.jpg)
Illustrative Example: Boxplot 3
Seven metabolic rates:
1362 1439 1460 1614 1666 1792 1867
7N =
Data source: Moore,
2000
1900
1800
1700
1600
1500
1400
1300
1. 5-point summary: 1362, 1449.5, 1614, 1729, 1867
2. IQR = 1729 – 1449.5 = 279.5
FU = Q3 + 1.5(IQR) = 1729 + (1.5)(279.5) = 2148.25
FL = Q1 – 1.5(IQR) = 1449.5 – (1.5)(279.5) = 1030.25
3. None outside
4. Whiskers end @ 1867 and 1362
42
![Page 43: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/43.jpg)
Boxplots: Interpretation
Location ◦ Position of median
◦ Position of box
Spread ◦ Hinge-spread (IQR)
◦ Whisker-to-whisker spread
◦ Range
Shape ◦ Symmetry or direction of skew
◦ Long whiskers (tails) indicate leptokurtosis (Long tails?)
43
![Page 44: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/44.jpg)
Side-by-side boxplots
Boxplots are especially useful when comparing groups
44
![Page 45: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/45.jpg)
Spread: Standard Deviation
Most common
descriptive
measures of spread
Based on deviations
around the mean.
This figure
demonstrates the
deviations of two of
its values
This data set has a mean of 36.
The data point 33 has a deviation of 33 – 36 = −3.
The data point 40 has a deviation of 40 – 36 = 4.
45
![Page 46: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/46.jpg)
Variance and Standard Deviation
xxiDeviation =
2
xxSSiSum of squared deviations =
1
2
n
SSsSample variance =
2ss Sample standard deviation =
46
![Page 47: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/47.jpg)
Standard deviation (formula)
2
)(1
1xx
ns
i
Sample standard deviation s is the estimator of
population standard deviation .
Sum of Squares
47
![Page 48: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/48.jpg)
Illustrative Example: Standard Deviation
Observation Deviations Squared deviations
36 36 36 = 0 02 = 0
38 38 36 = 2 22 = 4
39 39 36 = 3 32 = 9
40 40 36 = 4 42 = 16
36 36 36 = 0 02 = 0
34 34 36 = 2 (2)2 = 4
33 33 36 = 3 (3)2 = 9
32 32 36 = 4 (4)2 = 16
SUMS 0* SS = 58
2
xxixx
i
ix
* Sum of deviations always equals zero
48
![Page 49: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/49.jpg)
Illustrative Example (cont.)
232)g/m( 286.8
18
58
1
n
SSs
32g/m 88.2286.8 ss
Sample variance (s2)
Standard deviation (s)
49
![Page 50: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/50.jpg)
Interpretation of Standard Deviation
Measure spread (e.g., if group was s1 =
15 and group 2 s2 = 10, group 1 has
more spread, i.e., variability)
68-95-99.7 rule (next slide)
Chebychev’s rule (two slides hence)
50
![Page 51: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/51.jpg)
68-95-99.7 Rule Normal Distributions Only!
68% of data in the range μ ± σ
95% of data in the range μ ± 2σ
99.7% of data in the range μ ± 3σ
Example. Suppose a variable has a Normal
distribution with μ = 30 and σ = 10. Then:
68% of values are between 30 ± 10 = 20 to 40
95% are between 30 ± (2)(10) = 30 ± 20 = 10
to 50
99.7% are between 30 ± (3)(10) = 30 ± 30 = 0
to 60 51
![Page 52: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/52.jpg)
Chebychev’s Rule
All Distributions
Chebychev’s rule says that at least 75% of
the values will fall in the range μ ± 2σ (for
any shaped distribution)
Example: A distribution with μ = 30 and σ
= 10 has at least 75% of the values in the
range 30 ± (2)(10) = 10 to 50
52
![Page 53: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/53.jpg)
Rules for Rounding
Carry at least four significant digits during
calculations.
Round at last step of operation
Always report units
Always use common sense and good judgment.
53
![Page 54: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/54.jpg)
Choosing Summary Statistics
Always report a measure of central
location, a measure of spread, and the
sample size
Symmetrical mound-shaped
distributions report mean and
standard deviation
Odd shaped distributions report 5-
point summaries (or median and IQR)
54
![Page 55: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/55.jpg)
Software and Calculators
Use software and calculators to check work.
55
![Page 56: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/56.jpg)
Excel Data Analysis ToolPak
Data set: Boxplot.xlsx
Summary statistics using Excel
56
![Page 57: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/57.jpg)
Excel Data Analysis ToolPak
Get summary statistics using Excel
57
![Page 58: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/58.jpg)
Boxplot using Excel
Youtube links to get boxplot in Excel
http://www.youtube.com/watch?v=s8ZW
4PVarwE&feature=related
58
0
10
20
30
40
50
60
70
Female age Male age
![Page 59: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/59.jpg)
Boxplot and summary statistics by
JMP
Data set:
Boxplot.jmp
What do you say
about the
comparison of
distributions of
ages for females
and males?
59
![Page 60: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/60.jpg)
60
![Page 61: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25 · Stemplot Illustrative Example Select an SRS of 10 ages List](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe748080e1e7e5152686951/html5/thumbnails/61.jpg)
Exercise
Surgical times. Durations of surgeries (hours) for 15 patients receiving artificial hearts are shown here. Create a stem plot of these data. Describe the distribution. Are there any outliers? What is the standard deviation of this data set? Draw a boxplot based on this data set.
7.0 6.5 3.5 3.1 2.8 2.5 3.8 2.6 2.4 2.1 1.8 2.3 3.1 3.0 2.5
Data set: Presentation2_Exercise.xlsx
Presentation2_exercise.jmp 61