L02: Chapter 2 Descriptive Statistics: Tabular and Graphical
Presentations
Introduction
Summarizing categorical Data
◦ Looking at things like “Female/Male” “State of Birth”
and organizing the data
Frequency Distribution: Table showing the
number (frequency) of items in nonoverlapping
classes.
◦ Classes: Groups or categories
Example: Female clothing store is trying to
decide if it should spend money advertising on
or near campus. The store wants to understand
the “shape” of gender data on campus.
◦ What are the classes of data of interest?
Frequency and Distributions
Frequency: Count occurrences
◦ Frequency Distribution: Puts frequency info in a ______
Relative Frequency: Turn occurrences into a decimal
◦ Relative Frequency Distribution: Puts relative frequency info
into a _______
Percent Frequency: Turn occurrences into a percent
◦ Percent frequency distribution: puts percent frequency
information into a _______
Lets add a couple of columns to our table to include
relative and percent frequency distributions.
Graphical Representations
Bar Graph: Graphical device for showing
categorical data from a frequency distribution
◦ Horizontal axis _______
◦ Vertical axis some measure of _______
◦ Bars themselves are the _______ width
Pie Chart: Another graphical device for
showing frequency distributions
◦ Draw a circle, then use relative frequencies to divide
the circle for each class.
◦ Relative frequency of .25 .25(360) = 90 Degrees
Summarizing Quantitative Data –
Ch02 Section 2 Frequency Distribution
Relative Frequency Distribution
Percent Frequency Distribution
Histogram
Cumulative Distributions
Example
Rock Climbing Gym in Ogden
◦ Most people responding to advertising for the
climbing between 22-23 years old
◦ Again we want to know if it should spend money on
advertising.
◦ What is different about the data of interest compared
to the female clothing store example?
◦ It is _______!
NOTE – With categorical data, classes are
determined by categories of the data
◦ With quantitative, researcher has to choose classes.
Not a precise science
Choosing Classes
Guidelines to choosing classes
◦ Use between 5 and 20 classes
◦ Data sets with more elements typically require more
classes.
◦ Choose classes so that you can see variation in the data
◦ If you have too many classes, some may contain only a few
data points.
◦ Choose classes so they don’t overlap.
◦ Takes some trial and error
What about this classroom would help us to choose classes
for the age variable?
Do I need 100 classes for age? Why not?
Do I need a 12-13 year old class? A 100 year old class?
In class experiment (Rock Climbing Gym, 22-
23 year olds)
Let’s try some age classes and see how it
works out for us.
Another option is to use this formula.
Let’s put the data on the board and see
what we get.
Human Frequency graph
Classes ofNumber
Value DataSmallest Value DataLargest
Example: Hudson Auto Repair
The manager of Hudson Auto
would like to have a better
understanding of the cost
of parts used in the engine
tune-ups performed in the
shop. She examines 50
customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
Example: Hudson Auto Repair
Sample of Parts Cost($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
•Find the max and min in order to
form class widths
•Let’s do 6 classes.
Frequency Distribution For Hudson Auto Repair, if we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5 10
Parts Cost ($) Frequency
Cost ($) Freq. Relative Freq.
Percent Freq
50-59 2 0.04 4 60-69 13 0.26 26 70-79 16 0.32 32 80-89 7 0.14 14 90-99 7 0.14 14 100-109 5 0.1 10 Total 50 1 100
.04(100) 2/50
• Only ____ of the parts costs are in the $50-59 class.
• The greatest percentage (______ or almost one-third)
of the parts costs are in the $70-79 class.
• _____of the parts costs are under $70.
•______of the parts costs are $100 or more.
What are the insights from the distributions?
Would it be a good idea to have a Tune-up sale, Every car
$65
Relative Frequency and
Percent Frequency Distributions
Graphical Methods for Quantitative
Data After tabular methods for categorical data, what
did we do next?
Histogram: A graph that shows frequency
information for quantitative data
Above each class interval, a _______ that
represents the a measure of the class’s ______
No natural separation between
rectangles/classes
◦ This is why “no overlapping classes” is important
Where do I get the frequency Info?
Histogram
2
4
6
8
10
12
14
16
18
______ ______
__
__
______
5059 6069 7079 8089 9099 100-110
Tune-up Parts Cost
?
?
Histograms and Shape Histograms help us see the shape of data
Skew – A measure of the symmetry of
Data
Symmetric : Left tail mirrors the right tail.
Example: Heights and weights of people R
elat
ive
Fre
qu
ency
.05
.10
.15
.20
.25
.30
.35
0
Skewed left
◦ The left tail is ________
◦ Example: Exam Scores
Histograms and Shape
Rel
ativ
e F
req
uen
cy
.05
.10
.15
.20
.25
.30
.35
0
Histogram and Shape
Skewed Right
◦ A long tail to the ________
◦ Example: Executive Salaries
Rel
ativ
e F
req
uen
cy
.05
.10
.15
.20
.25
.30
.35
0
Cumulative Info: Tables and Graphs
Cumulative frequency Distribution: Table
that show the number of items with
values less than or equal to the upper
limit of each class
Cumulative relative frequency distribution:
show the proportions
Cumulative percent frequency
distribution: show the percentages
Parts Cost ($) Frequency & Cum Frequency
Cost ($) Freq. Cost ($) Cum. Freq.
Cum. Rel. Freq
Cum. Percent Freq.
50-59 2 ≤ 59 2 0.04 4
60-69 13 ≤ 69 15 0.3 30
70-79 16 ≤ 79 31 0.62 62
80-89 7 ≤ 89 38 0.76 76
90-99 7 ≤ 99 45 0.9 90
100-109 5 ≤ 109 50 1 100
Total 50
Frequency Cumulative Frequency
2 + 13 15/50 .30(100)
Graphical Representation of
Cumulative Data Ogive: A graph of the cumulative
distribution.
◦ Horizontal Axis: Data values
◦ Vertical Axis
Cumulative frequencies, or
Cumulative Relative Frequencies, or
Cumulative percent frequencies
◦ The cumulative frequencies are plotted as a
point. Put point above highest value in class
◦ Points are connected by lines (connect the
dots)
< 59
< 69
< 79
< 89
< 99
< 109
Cost ($) Cumulative Frequency
Cumulative Relative Frequency
Cumulative Percent Frequency
2
15
31
38
45
50
.04
.30
.62
.76
.90
1.00
4
30
62
76
90
100
Stem-and-Leaf Display
Def: Graph that shows order and shape of
data
Like a histogram on its side, but shows
actual data values
•All digits on left side are
called a STEM
•Stems are listed in
ascending order
•The LEAF for each data
point goes on the right
hand side.
•For each stem, leafs are
arranged in ascending
order.
Example: Hudson Auto Repair
Sample of Parts Cost($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
•A Data set we are familiar with –
Hudson Auto.
•Lets make a Stem-and-leaf display
Stem-and-Leaf Display
52 62
57 62
62
62
65
66
67
68
68
68
69
69
69
5
6
7
8
9
10
2 7
2 2 2 2 5 6 7 8 8 8 9 9 9
1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9
0 0 2 3 5 8 9
1 3 7 7 7 8 9
1 4 5 5 9
a stem
a leaf
Stem-and-Leaf Display
A single digit is used to define each leaf
In the preceding example, the leaf unit
was 1
Leaf units may be 100, 10, 1, 0.1, and so
on.
Example: Leaf Unit = 0.1
Using the following data, Create a stem-
and-leaf display using a leaf unit of .1
8
9
10
11
Leaf Unit = 0.1
__ __
1 4
_
0 7
8.6 11.7 9.4 9.1 10.2 11.0 8.8
A stem-and-leaf display of these data will be
Example: Leaf Unit = 10 Using the following data values, create a stem-and-leaf
diagram using a leaf unit of 10.
16
17
18
19
Leaf Unit = 10
_
_ _
0 3
1 7
1806 1717 1974 1791 1682 1910 1838
A stem-and-leaf display of these data will be
The 82 in 1682 is rounded down to 80 and is represented as an 8.
Cross Tabulations and Scatter
Diagrams So far: One Variable at a Time
Often, managers are interested in knowing
the relationship between two variables at
a time
Crosstabulations and scatter diagrams allow
us to summarize two variables at a time
Interesting Examples
◦ Height and Salary Two Quantitative Variables
◦ Beauty and Salary One Cat., One Quant.
◦ Art Quality and City Two categorical
Crosstabulation Finger Lakes Homes – Number sold for
each style and price for last two years
Price Range Colonial Log Split A-Frame Total
< $99,000
> $99,000
18 6 19 12 55
45
30 20 35 15 Total 100
12 14 16 3
Home Style
quantitative variable
categorical variable
Notice that the Classes are in the left and top margins
Crosstabulation – Row Percentages
Row Percentages answer the question: How many homes that
cost more than $99,000 were colonial, log, etc…
Price Range Colonial Log Split A-Frame Total
< $99,000
> $99,000
18 6 19 12 55
45
30 20 35 15 Total 100
12 14 16 3
Home Style
Price
Range Colonial Log Split A-Frame Total
< $99,000
> $99,000
32.73 10.91 34.55 21.82 100
100 26.67 31.11 35.56 6.67
Home Style
(Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100
Crosstabulation – Column Percentages
Column
Percentages
ask the question:
What percent of
Colonial Homes
were $99,000
etc…
Price Range Colonial Log Split A-Frame Total
< $99,000
> $99,000
18 6 19 12 55
45
30 20 35 15 Total 100
12 14 16 3
Home Style
Price
Range Colonial Log Split A-Frame
< $99,000
> $99,000
60.00 30.00 54.29 80.00
40.00 70.00 45.71 20.00
Home Style
100 100 100 100 Total
(Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100
Scatter Diagram
Def: A graphical presentation of the
relationship between two variables
One variable on horizontal axis
One variable on vertical axis
Variables are not grouped into classes
The general pattern of the plotted points gives
insight to the relationship between variables.
◦ Can everyone plot points on an X-Y plane?
A trendline approximates the relationship
Correlation does not equal causation!
A _______ Relationship
x
y
Scatter Diagram and Trendline
A _____ Relationship
x
y
Scatter Diagram and Trendline
No Apparent Relationship
x
y
Scatter Diagram and Trendline
Example: Panthers Football Team
Scatter Diagram and Trendline
A football team is interested
in investigating the relationship, if any,
between interceptions made and points
scored.
1
3
2
1
3
14
24
18
17
30
x = Number of
Interceptions
y = Number of
Points Scored
y
x
Number of Interceptions
Nu
mb
er o
f P
oin
ts S
core
d
5
10
15
20
25
30
0
35
1 2 3 0 4
Scatter Diagram and Trendline
Scatter Diagrams
Insights from Panthers
◦ Positive or negative relationship?
◦ More interceptions, more points?
◦ Perfectly linear relationship?
Tabular and Graphical Procedures: A review
Categorical Data Quantitative Data
Tabular
Methods
Tabular
Methods
Graphical
Methods
Graphical
Methods
• Frequency
Distribution
• Relative Freq.
Distribution
• Percent Freq.
Distribution
• Crosstabulation
• Bar Graph
• Pie Chart
• Frequency Dist.
• Rel. Freq. Dist.
• % Freq. Dist.
• Cum. Freq. Dist.
• Cum. Rel. Freq.
Distribution
• Cum. % Freq.
Distribution
• Crosstabulation
• Dot Plot
• Histogram
• Ogive
• Stem-and-
Leaf Display
• Scatter
Diagram
Data