Chapter 4
Describing Data
Displaying and Exploring
Data
Dr.M
an
ah
il Ka
ma
l M.E
ltib
GOALS
Develop and interpret a stem-and-leaf display
Compute and understand quartiles,
deciles, percentiles and coefficient of Skewness.
Construct and interpret box plots.
Draw and interpret a
scatter diagram
Dr.M
an
ah
il Ka
ma
l M.E
ltib
STEM-AND-LEAF
Stem-and-leaf display is a statistical technique to present a set of data. Each numerical value is divided into two parts. The leading digit(s) becomes the stem and the trailing digit the leaf. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.
Advantage of the stem-and-leaf display over a frequency distribution - the identity of each observation is not lost.
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Example (1) :
Make a stem and leaf plot of the algebra test scores given below.
(Then complete each question)
56, 65, 98, 82, 64, 71, 78, 77, 86, 95, 91, 59, 69, 70, 80, 92, 76, 82, 85, 91, 92, 99, 73
Solution:
Put the scores in numerical order
56, 59, 64, 65, 69, 70, 71, 73, 76, 77, 78, 80, 82, 82, 85, 86, 91, 91, 92, 92, 95, 98, 99
Since the data range from 56 to 99, the stems range from 5 to 9. To plot the data, make a vertical list of the stems. Each number is assigned to the graph by pairing the unit's digit, or leaf, with the correct stem. The score 56 is plotted by placing the units digit, 6, to the right of stem 5.
Dr.M
an
ah
il Ka
ma
l M.E
ltib
What was the lowest score
on the algebra test?
56
What was the highest score
on the algebra test?
99
In which interval did most
students score?
91 to 99 (7 students)
How much is the sample
size by the shape?
23 students
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Example (2) : (Text book P"102")
Listed in the following table is the number of
30-second radio advertising spots purchased by
each of the 45 members of the Greater Buffalo
Automobile Dealers Association last year.
Organize the data into a stem-and-leaf display.
Around what values do the numbers of
advertising spots tend to cluster? What is the
fewest number of spots purchased by a dealer?
The largest number purchased?
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Example (3) : Use a Stem-and-Leaf Plot to Find
Mean, Median and Mode of a Set of data
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Solution:
According to the shape of the original data is:
35 , 36 , 37 , 38 , 40 ,40 , 41 , 42 , 43 , 55 , 55, 55 , 56
, 57 , 58 , 59
1. The mean = 46.68
2. The median = 42.5
3. The mode = 55
Dr.M
an
ah
il Ka
ma
l M.E
ltib
QUARTILES, DECILES AND PERCENTILES
Alternative ways of describing spread of data
include determining the location of values that
divide a set of observations into equal parts.
These measures include quartiles, deciles, and
percentiles.
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Percentiles
A percentile is a measure at which that percentage of the
total values are the same as or below that measure. For
example, 90% of the data values lie below the 90th
percentile, whereas 10% of the data values lie below the 10th
percentile.
Quartiles
Quartiles are values that divide a (part of a) data table into
four groups containing an approximately equal number of
observations. The total of 100% is split into four equal parts:
25%, 50%, 75% and 100%.
First quartile (lower quartile) to be at the 25th percentile.
Median (or second quartile) to be at the 50th percentile.
Third quartile (upper quartile) to be a the 75th percentile
Dr.M
an
ah
il Ka
ma
l M.E
ltib
To find the P-th Percentile:
Sort all observations in ascending order (computing percentiles for non-
sorted data is the most common mistake).
Compute the position L = (P/100) * (n+1)
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Example (4): Consider the following cotinine levels of 40 smokers:
Find the quartiles and the 40th percentile.
Find the quartiles and the 40th percentile.
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Solution:
First note that before we start our computations we must
sort the data
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Lower Quartile:
Location of LQ:
By reference to the data Element No. 10 = 86 and No.11= 87
By reference to the data Element No. 20 = 167 and No.21= 173
Second Quartile: (Median)
Location of SQ:
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Upper Quartile:
Location of UQ
By reference to the data element No. 30= 250 and No.31= 253
40th Percentile
Location of40th Percentile:
Dr.M
an
ah
il Ka
ma
l M.E
ltib
BOXPLOT
A box plot is a way of summarizing a set of data
measured on an interval scale. It is often used in
exploratory data analysis. It is a type of graph
which is used to show the shape of the
distribution, its central value, and variability
(maximum and minimum values, the lower and
upper quartiles, and the median)
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Example (5): The following graph represents data example 4
Dr.M
an
ah
il Ka
ma
l M.E
ltib
SKEWNESS
The first thing you usually notice about a
distribution’s shape is whether it has one mode
(peak) or more than one. If it’s unimodal (has just
one peak), like most data sets, the next thing you
notice is whether it’s symmetric or skewed to one
side.
If the bulk of the data is at the left and the right
tail is longer, we say that the distribution is skewed
right or positively skewed; if the peak is toward the
right and the left tail is longer, we say that the
distribution is skewed left or negatively skewed
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Types of Skewness
Symmetric
Positively skewed
Bimodal
Negatively skewed
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Dr.M
an
ah
il Ka
ma
l M.E
ltib
COEFFICIENT OF SKEWNESS
Skewness is measured using the Pearson coefficient
The coefficient of Skewness can range from -3 up to 3.
A value near -3 indicates considerable negative Skewness.
A value near 3 indicates moderate positive Skewness.
A value of 0, which will occur when the mean and median are
equal, indicates the distribution is symmetrical and that there
is no Skewness present.
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Example (6)
Calculate the coefficient of Skewness of the following data by using Pearson's method
2 , 3 , 3 , 4 , 4 , 6 , 6
Solution:
The median = 4
SK= 3(4-4)/1.53 = 0
Symmetric (Normal)
Dr.M
an
ah
il Ka
ma
l M.E
ltib
DESCRIBING RELATIONSHIP BETWEEN TWO
VARIABLES
One graphical technique we use to show the
relationship between variables is called a scatter
diagram.
To draw a scatter diagram we need two variables.
We scale one variable along the horizontal axis
(X-axis) of a graph and the other variable along
the vertical axis (Y-axis).
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Ice Cream Sales vs Temperature
(X) Temperature °C (Y) Ice Cream Sales
14.2° 215$
11.9 ° 185$
15.2 ° 332$
18.5 ° 406$
22.1 ° 522$
19.5 ° 412$
25.1 ° 416$
23.4 ° 544$
18.1° 421$
22.6° 445$
17.5° 408$
Example (7) : The local ice cream shop keeps track of how much ice
cream they sell versus the noon temperature on that day. Here are
their figures for the last 11 days :
And here is the same data as a Scatter Plot:
Dr.M
an
ah
il Ka
ma
l M.E
ltib
Dr.M
an
ah
il Ka
ma
l M.E
ltib