chapter two organizing and summarizing data and summarizing data learning objectives: 1. organize...
TRANSCRIPT
Organizing and Summarizing Data Learning Objectives:
1. Organize qualitative data using
a) Frequency and relative frequency table, b) bar graph, c) pie graph and d) Pareto graph.
2. Organize quantitative data for
1. Discrete Data using a) Frequency and relative frequency table, b) bar graph, c) pie graph and d) Pareto graph
2. Continuous Data using a) histogram, b) stem-leaf plot, c) Time series plot
1
Data Presentation
Summary
Table
Dot
Chart
Pie
Chart
Quantitative
Data
Data Presentation
Bar
Chart
Qualitative
Data
Stem-&-Leaf
Display
Frequency
Distribution
Histogram Time Series
Plot
Box Plot
2
Organizing Qualitative or Categorical Data
• A statistical table can be used to display data graphically as a data distribution: consists of Class, Class Frequency, Relative Frequency or Percentage
• For qualitative data, three measurements are available for the list of categories:
– the frequency, or number of measurements
– the relative frequency, or proportion = frequency / Total # of observations
– the percentage
• A pie chart is the familiar circular graph that shows how the measurements are distributed among the categories.
• A bar chart shows the same distribution of measurements in categories, with the height of the bar measuring how often a particular category was observed.
• Pareto Chart A bar chart in which the bars are ordered from largest to smallest is called a Pareto chart.
3
A survey of 400 individuals are survey to rate the school quality.
The data is summarized: Rating A B C D
Frequency 35 260 93 12
Relative Frequency
Percentage
Draw a pie chart, a Bar Chart and a Pareto chart
B C A D
4
Exercise: A set of ten students is selected , and measurements are recorded as in the following table:
[similar exam questions] Number of Credit
Student GPA Gender Year Major Hours Enrolled
1 2.0 F 1 Psychology 16
2 2.3 F 2 Mathematics 15
3 2.9 M 2 English 17
4 2.7 M 1 English 15
5 2.6 F 3 Business 14
6 3.2 F 3 Computer 16
7 2.7 F 1 Chemistry 14
8 3.5 M 4 Chemistry 15
9 2.1 M 3 Business 12
10 2.7 F 3 Sociology 16
• What variables can be described using pie chart or bar chart?
• Construct a Bar chart and Pareto chart for the variable Year.
5
Answer
Variables that can be described by Bar cart or Pie
chart must be qualitative or discrete variables.
Gender and Major are qualitative variables.
Year and # of credit Hours Taken are discrete.
GPA is a continuous variable.
NOTE: ID is not a characteristic for describing
students.
6
Organizing Quantitative Data-Popular Plots
• If the variable can take only a finite or countable number of
values, it is a discrete variable.
– For a discrete variable, Bar chart, Pie chart or Pareto charts can be
applied to describe the discrete variable as we did for qualitative
variables.
• A variable that can assume an infinite number of values
corresponding to points on a line interval is called
continuous.
– Stem & Leaf Plot , and Histogram are two common graphs to
display continuous data.
– Time Series Plot is applied to display the data along the Time
domain for demonstrating trends or patterns along the time.
7
• Dotplots: Plots the measurements as points on the x axis,
stacking the points that duplicate existing points.
8
Stem and leaf plots: This plot presents a graphical display of the data using the actual
numerical values of each data point.
Constructing a Stem and Leaf Plot:
1. Divide each measurement into two parts: the stem and
the leaf.
2. List the stems in a column, with a vertical line to their right.
3. For each measurement, record the leaf portion in the same
row as its matching stem.
4. Order the leaves from lowest to highest in each stem.
5. Provide a key to your stem and leaf coding so that the
reader can recreate the actual measurements if necessary.
9
Example
The following Table lists the prices (in dollars) of 19 different brands of
walking shoes. Construct a tem and leaf plot to display the distribution of
the data.
90 70 70 70 75 70
65 68 60 74 70 95
75 70 68 65 40 65
70
Solution
The price 74 is represented by
the stem 7 and leaf 4. The price
obtained by: 74 x (Leaf Unit) =
74x(1) = 74.
10
Interpreting Graphs with a Critical Eye:
• What to look for as you describe the data:
- Scales : The measurement unit such as $, inches, etc
- location: Where is the center of the data
- shape: The shape of the frequency distribution.
- outliers: Some unusual data values, such as 6000 miles away from home when comparing with the rest.
• Distributions are often described by their shapes:
- symmetric
- skewed to the right (long tail goes right)
- skewed to the left (long tail goes left)
- unimodal, bimodal, multimodal (one peak, two peaks,
many peaks)
11
Examine the three dotplots generated by Minitab and shown in the
following Figure Describe these distributions in terms of their
locations and shapes.
Figure : Character Dotplots and the corresponding distribution
shapes
Symmetric Skew-to-right Skew-to-left
• Skew-to-right: Most values are small. Only a few are much larger. The
long tail is on the right side.
• Skew-to-left: Most vaules are large. Only a few are much smaller. The
long tail is on the left side.
Similar Exam questions
Identify the Shape of a Distribution
12
Exercise
Determine the shape of the distribution of
each of the following variables:
1. Score of a very easy test
2. Score of a very difficult test
3. Entry level salary for college graduates
4. Adult’s height
13
Answer
1. Very easy test: skew-to-the-left (most scores are
high. Only a few low scores)
2. Very difficult test: skew-to-the-right (most scores
are low. Only a few high scores.)
3. Entry level salary: likely to be skew-to-the-right.
Since most salaries would be lower than
$50,000. A few could be quite high.
4. Adult’s height: this has a typical symmetric
distribution
14
Relative Frequency Histograms
What is it?
A relative frequency histogram for a quantitative data set is a graph that describes the relative frequency (or frequency) of the variable, for example, distance from home, in which the possible values of the variable are divided into a few groups (classes, or intervals), the relative frequency (or frequency ) is represented by a rectangle with the height representing the proportion or relative frequency of occurrence for a particular class (or group) of the variable being measured.
• On the X axis: The class, (or group) of the variable are plotted along the x axis.
• On the Y-axis: The relative frequency or frequency of observations within the class is the height on the Y axis.
15
Histogram for Continuous Data
Why do we want to do this?
Histogram summaries data values of the variable in a graph that can demonstrate the distribution of the variable, so that it helps us to quickly visualize where are the majority of data values, if there are some very unusual data values, if these unusual data on the high side or on the low end? Are data values very far apart or are they very close to each other?, and so on
Is this different from Bar or pie graph?
YES, it is different. Bar or pie graph is for categorical or discrete variables. Histogram is for continuous variables.
16
How to construct a histogram? By hand (in case you do not have technology):
Constructing a relative frequency histogram for continuous variables:
1. Choose the number of classes, usually between 5 and 15.
2. Calculate the approximate class width by dividing the difference between the largest and smallest values
(Range = largest – smallest) by the number of classes.
3. Round the approximate class width up to a convenient number.
4 Locate the class boundaries.
If discrete, assign one or more integers to a class.
If continuous, use Method of left inclusion: Include the left class boundary point but not the right boundary point in the class.
– NOTE: Different methods may be used in different software. Some may use right inclusion. Some may add an additional decimal place for the class boundary.
5. Construct a statistical table containing the classes, their boundaries, and their relative frequencies.
6. Construct the histogram like a bar graph. 17
Example: Constructing Histogram by hand
The following Table lists the prices (in dollars) of 19 different brands of walking shoes. Construct a relative histogram to display the distribution of the data.
90 70 70 70 75 70 65 68 60 74 70 95
75 70 68 65 40 65 70
Solution
1. Determine # of classes: for example, use k=6 classes
2. Range = 95 -40 = 55,
3. Class width = 55/6 ~ 9.17 ~ 10
(Run the width up (not run off nor truncate) to a ‘convenient number.)
4. Use left-inclusion to determine class boundaries:
[40,50),[50,60), [60, 70), [70,80),[80,90),[90,100)
5. Construct a Relative Frequency Table – first count # of observations in each
class. This is the frequency, call it fi. Relative frequency (rfi) = fi/n , where n is the
total # of data points.
6. Draw a two-dimensional graph with
X-axis: the class boundaries of the variable, and Y-axis: the relative frequency for
each class, and a rectangle with the relative frequency as the height for
each class. 18
Activity : Complete the construction of the Histogram
Relative Frequency Table Histogram
ShoePrice
Fre
qu
en
cy
100908070605040
10
8
6
4
2
0
2
0
10
6
0
1
Histogram of ShoePriceGroup Frequency Relative
Frequency
[40,50) 1 1/19
[50,60) 0 0
[60,70) 6 6/19
[70,80) 10 10/19
[80.90) 0 0
[90,100) 2 2/19
Histogram constructed using Minitab
NOTE: We need to know how a relative frequency and a
histogram are constructed. The construction of a histogram,
however, can be easily done by computer software. 19
Using Minitab to create the Default Histogram for
the Shoes Price Data
ShoePrice
Fre
qu
en
cy
100908070605040
14
12
10
8
6
4
2
0
11
2
13
1
0
1
Histogram of ShoePrice - using Default options
Go to Minitab, on the Worksheet window, enter the prices of
the 19 pairs of shoes data, and give the column name: Price.
90 70 70 70 75 70 65 68 60 74 70 95
75 70 68 65 40 65 70
SAVE your data set: File, Save Worksheet As, Name it: Shoes
Price on your desktop.
Go to Graph menu, choose Histogram, select Simple, select
variable ‘Price’, OK.
20
Change the # of intervals in Minitab
As you see the default histogram has it’s own # of classes. One
can change this number to display a different histograms for the
same data. For the Shoes Price example, the following steps
change the # from 7 to 4 classes:
Click inside the histogram graph,
The bars are highlighted. Right-
click on the bars, choose
‘Edit Bars’,
Go to Binning menu, change # of
Intervals to 4, OK.
21
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
distance-exclude an extreme distance of
6000 miles
Fre
que
ncy
7
25
21
1 10 0 0
1
Histogram of Distance Data - Distance from Home for CMU
students. Sample Size =56
The Distance data is grouped into k
equal intervals, in this case, k = 9
(Minitab chooses k = 9, # of classes).
X-axis is the interval of distance.
Minitab chooses the first interval [-
50,49], 2nd interval [50,149] and so on.
Y-axis is the frequency of students
whose distance between each
respective interval.
For example, There are 25 distances
between 50 and 149 miles.
A rectangle is used to represent the
interval and the frequency.
A histogram shows the distribution of
the Distance variable. Several
properties can be noticed:
Majority of students are from within
250 miles with a few very far away.
The distribution of distance is very
skewed to the right side (where the
long tail is).
For the Distance Data
used in your Activiy#1,can
you construct the histogram on
the left for the above data?
22
The following data represent the closing value of
the Dow Jones Industrial Average for the years
1980 - 2001.
23
What did you learn from this chapter?
Graphical display for qualitative or categorical data: bar chart, pie chart, Pareto chart.
Graphical display for discrete quantitative data: bar chart, pie chart, Pareto chart.
Graphical display for quantitative continuous data: stem-leaf plot, dot plot, histogram, time-series plot.
The shape of distribution: skew-to-left, symmetric, skew-to-right.
Outliers, rare cases.
Real-time activities for illustration: How far are you from home? Does one minute of exercise increase your pulse rate dramatically?
25